This is a starter RMarkdown template to accompany Data Visualization (Princeton University Press, 2019). You can use it to take notes, write your code, and produce a good-looking, reproducible document that records the work you have done. At the very top of the file is a section of metadata, or information about what the file is and what it does. The metadata is delimited by three dashes at the start and another three at the end. You should change the title, author, and date to the values that suit you. Keep the output line as it is for now, however. Each line in the metadata has a structure. First the key (“title”, “author”, etc), then a colon, and then the value associated with the key.
Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. A code chunk is a specially delimited section of the file. You can add one by moving the cursor to a blank line choosing Code > Insert Chunk from the RStudio menu. When you do, an empty chunk will appear:
Code chunks are delimited by three backticks (found to the left of the 1 key on US and UK keyboards) at the start and end. The opening backticks also have a pair of braces and the letter r, to indicate what language the chunk is written in. You write your code inside the code chunks. Write your notes and other material around them, as here.
To install the tidyverse, make sure you have an Internet connection. Then manually run the code in the chunk below. If you knit the document if will be skipped. We do this because you only need to install these packages once, not every time you run this file. Either knit the chunk using the little green “play” arrow to the right of the chunk area, or copy and paste the text into the console window.
## This code will not be evaluated automatically.
## (Notice the eval = FALSE declaration in the options section of the
## code chunk)
my_packages <- c("tidyverse", "broom", "coefplot", "colorblindr" "cowplot",
"dichromat" "gapminder", "GGally", "ggrepel", "ggridges",
"ggthemes", "gridExtra", "here", "interplot", "margins",
"maps", "mapproj", "mapdata", "MASS", "quantreg",
"rlang", "scales", "survey", "srvyr", "statebins",
"viridis", "viridisLite", "devtools")
install.packages(my_packages, repos = "http://cran.rstudio.com")
devtools::install_github("kjhealy/socviz", "hrbrmstr/hrbrthemes", "kjhealy/myriad")
To begin we must load some libraries we will be using. If we do not load them, R will not be able to find the functions contained in these libraries. The tidyverse includes ggplot and other tools. We also load the socviz and gapminder libraries.
Notice that here, the braces at the start of the code chunk have some additional options set in them. There is the language, r, as before. This is required. Then there is the word setup, which is a label for your code chunk. Labels are useful to briefly say what the chunk does. Label names must be unique (no two chunks in the same document can have the same label) and cannot contain spaces. Then, after the comma, an option is set: include=FALSE. This tells R to run this code but not to include the output in the final document.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
gapminder
## # A tibble: 1,704 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
## 7 Afghanistan Asia 1982 39.9 12881816 978.
## 8 Afghanistan Asia 1987 40.8 13867957 852.
## 9 Afghanistan Asia 1992 41.7 16317921 649.
## 10 Afghanistan Asia 1997 41.8 22227415 635.
## # ... with 1,694 more rows
The remainder of this document contains the chapter headings for the book, and an empty code chunk in each section to get you started. Try knitting this document now by clicking the “Knit” button in the RStudio toolbar, or choosing File > Knit Document from the RStudio menu.
my_numbers <- c(1, 1, 4, 1, 1, 4, 1)
my_numbers
## [1] 1 1 4 1 1 4 1
4 + 1
## [1] 5
letters
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
## [20] "t" "u" "v" "w" "x" "y" "z"
my_numbers <- c(1, 2, 3, 1, 3, 5, 25)
your_numbers <- c(5, 31, 71, 1, 3, 21, 6)
my_numbers
## [1] 1 2 3 1 3 5 25
mean(x = my_numbers)
## [1] 5.714286
mean(x = your_numbers)
## [1] 19.71429
mean(my_numbers)
## [1] 5.714286
my_summary <- summary(my_numbers)
my_summary
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.500 3.000 5.714 4.000 25.000
table(my_numbers)
## my_numbers
## 1 2 3 5 25
## 2 1 2 1 1
sd(my_numbers)
## [1] 8.616153
my_numbers*5
## [1] 5 10 15 5 15 25 125
my_numbers + 1
## [1] 2 3 4 2 4 6 26
my_numbers + your_numbers
## [1] 6 33 74 2 6 26 31
class(my_numbers)
## [1] "numeric"
class(my_summary)
## [1] "summaryDefault" "table"
class(summary)
## [1] "function"
my_new_vector <- c(my_numbers, "Apple")
my_new_vector
## [1] "1" "2" "3" "1" "3" "5" "25" "Apple"
class(my_new_vector)
## [1] "character"
titanic
## fate sex n percent
## 1 perished male 1364 62.0
## 2 perished female 126 5.7
## 3 survived male 367 16.7
## 4 survived female 344 15.6
class(titanic)
## [1] "data.frame"
titanic$percent
## [1] 62.0 5.7 16.7 15.6
titanic_tb <- as_tibble(titanic)
titanic_tb
## # A tibble: 4 x 4
## fate sex n percent
## <fct> <fct> <dbl> <dbl>
## 1 perished male 1364 62
## 2 perished female 126 5.7
## 3 survived male 367 16.7
## 4 survived female 344 15.6
str(my_numbers)
## num [1:7] 1 2 3 1 3 5 25
str(my_summary)
## 'summaryDefault' Named num [1:6] 1 1.5 3 5.71 4 ...
## - attr(*, "names")= chr [1:6] "Min." "1st Qu." "Median" "Mean" ...
url <- "https://cdn.rawgit.com/kjhealy/viz-organdata/master/organdonation.csv"
organs <- read_csv(file = url)
##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_double(),
## country = col_character(),
## world = col_character(),
## opt = col_character(),
## consent.law = col_character(),
## consent.practice = col_character(),
## consistent = col_character(),
## ccode = col_character()
## )
## i Use `spec()` for the full column specifications.
library(gapminder)
gapminder
## # A tibble: 1,704 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
## 7 Afghanistan Asia 1982 39.9 12881816 978.
## 8 Afghanistan Asia 1987 40.8 13867957 852.
## 9 Afghanistan Asia 1992 41.7 16317921 649.
## 10 Afghanistan Asia 1997 41.8 22227415 635.
## # ... with 1,694 more rows
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point()
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point() + geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point() + geom_smooth(method = "lm")
## `geom_smooth()` using formula 'y ~ x'
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point() + geom_smooth(method = "gam") + scale_x_log10()
## `geom_smooth()` using formula 'y ~ s(x, bs = "cs")'
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point() + geom_smooth(method = "lm") +
scale_x_log10(labels = scales::dollar)
## `geom_smooth()` using formula 'y ~ x'
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = "purple"))
p + geom_point() +
geom_smooth(method = "loess") +
scale_x_log10()
## `geom_smooth()` using formula 'y ~ x'
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point(color = "purple") +
geom_smooth(method = "loess") +
scale_x_log10()
## `geom_smooth()` using formula 'y ~ x'
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point(alpha = 0.3) +
geom_smooth(color = "orange", se = FALSE, size = 2, method = "lm") +
scale_x_log10()
## `geom_smooth()` using formula 'y ~ x'
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point(alpha = 0.3) +
geom_smooth(method = "gam") +
scale_x_log10(labels = scales::dollar) +
labs(x = "GDP Per Capita", y = "Life Expectancy in Years",
title = "Economic Growth and Life Expectancy",
subtitle = "Data points are country-years",
caption = "Source: Gapiminder.")
## `geom_smooth()` using formula 'y ~ s(x, bs = "cs")'
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = continent))
p + geom_point() +
geom_smooth(method = "loess") +
scale_x_log10()
## `geom_smooth()` using formula 'y ~ x'
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = continent, fill = continent))
p + geom_point() +
geom_smooth(method = "loess") +
scale_x_log10()
## `geom_smooth()` using formula 'y ~ x'
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point(mapping = aes(color = continent)) +
geom_smooth(method = "loess") +
scale_x_log10()
## `geom_smooth()` using formula 'y ~ x'
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point(mapping = aes(color = log(pop))) + scale_x_log10()
p_out <- p + geom_point() + geom_smooth(method = "loess") + scale_x_log10()
ggsave(filename = "my_figure.png", plot = p_out)
## Saving 12 x 9 in image
## `geom_smooth()` using formula 'y ~ x'
here()
## [1] "C:/Users/mdabrowski/Documents/R/dataVisualization"
ggsave(here("figures", "life_vs_gdp_gradient.pdf"), plot = p_out,
height = 8, width = 10, units = "in")
## `geom_smooth()` using formula 'y ~ x'
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point() + geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
p + geom_smooth() + geom_point()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
p <- ggplot(data = gapminder, mapping = aes(x = pop, y = lifeExp))
p + geom_point()
p + geom_point() + scale_x_log10()
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point() + scale_x_sqrt()
p + geom_point() + scale_x_reverse()
p + geom_point() + scale_y_log10()
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = year))
p + geom_point()
gapminder
## # A tibble: 1,704 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
## 7 Afghanistan Asia 1982 39.9 12881816 978.
## 8 Afghanistan Asia 1987 40.8 13867957 852.
## 9 Afghanistan Asia 1992 41.7 16317921 649.
## 10 Afghanistan Asia 1997 41.8 22227415 635.
## # ... with 1,694 more rows
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = factor(year)))
p + geom_point()
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = factor(year)))
p + geom_point() +
geom_smooth(se = FALSE, method = "lm") +
scale_x_log10(labels = scales::dollar) +
labs(x = "GDP Per Capita", y = "Life Expectancy in Years",
title = "Economic Growth and Life Expectancy",
subtitle = "Data points are country-years",
caption = "Source: Gapiminder.")
## `geom_smooth()` using formula 'y ~ x'
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = country))
p + geom_point() +
geom_smooth(se = FALSE, method = "lm") +
scale_x_log10(labels = scales::dollar) +
labs(x = "GDP Per Capita", y = "Life Expectancy in Years",
title = "Economic Growth and Life Expectancy",
subtitle = "Data points are country-years",
caption = "Source: Gapiminder.") +
theme(legend.position = "bottom")
## `geom_smooth()` using formula 'y ~ x'
p <- ggplot(data = gapminder, mapping = aes(x = year, y = gdpPercap))
p + geom_line()
p <- ggplot(data = gapminder, mapping = aes(x = year, y = gdpPercap))
p + geom_line(aes(group = country))
p <- ggplot(data = gapminder, mapping = aes(x = year, y = gdpPercap))
p + geom_line(aes(group = country)) + facet_wrap(~ continent)
p <- ggplot(data = gapminder, mapping = aes(x = year, y = gdpPercap))
p + geom_line(color = "gray70", aes(group = country)) +
geom_smooth(size = 1.1, method = "loess", se = FALSE) +
scale_y_log10(labels = scales::dollar) +
facet_wrap(~ continent, ncol = 5) +
labs(x = "Year", y = "GDP per capita", title = "GDP per capita on Five Continents")
## `geom_smooth()` using formula 'y ~ x'
glimpse(gss_sm)
## Rows: 2,867
## Columns: 32
## $ year <dbl> 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2...
## $ id <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, ...
## $ ballot <labelled> 1, 2, 3, 1, 3, 2, 1, 3, 1, 3, 2, 1, 2, 3, 2, 3, 3,...
## $ age <dbl> 47, 61, 72, 43, 55, 53, 50, 23, 45, 71, 33, 86, 32, 60,...
## $ childs <dbl> 3, 0, 2, 4, 2, 2, 2, 3, 3, 4, 5, 4, 3, 5, 7, 2, 6, 5, 0...
## $ sibs <labelled> 2, 3, 3, 3, 2, 2, 2, 6, 5, 1, 4, 4, 3, 6, 0, 1, 3,...
## $ degree <fct> Bachelor, High School, Bachelor, High School, Graduate,...
## $ race <fct> White, White, White, White, White, White, White, Other,...
## $ sex <fct> Male, Male, Male, Female, Female, Female, Male, Female,...
## $ region <fct> New England, New England, New England, New England, New...
## $ income16 <fct> $170000 or over, $50000 to 59999, $75000 to $89999, $17...
## $ relig <fct> None, None, Catholic, Catholic, None, None, None, Catho...
## $ marital <fct> Married, Never Married, Married, Married, Married, Marr...
## $ padeg <fct> Graduate, Lt High School, High School, NA, Bachelor, NA...
## $ madeg <fct> High School, High School, Lt High School, High School, ...
## $ partyid <fct> "Independent", "Ind,near Dem", "Not Str Republican", "N...
## $ polviews <fct> Moderate, Liberal, Conservative, Moderate, Slightly Lib...
## $ happy <fct> Pretty Happy, Pretty Happy, Very Happy, Pretty Happy, V...
## $ partners <fct> NA, 1 Partner, 1 Partner, NA, 1 Partner, 1 Partner, NA,...
## $ grass <fct> NA, Legal, Not Legal, NA, Legal, Legal, NA, Not Legal, ...
## $ zodiac <fct> Aquarius, Scorpio, Pisces, Cancer, Scorpio, Scorpio, Ca...
## $ pres12 <labelled> 3, 1, 2, 2, 1, 1, NA, NA, NA, 2, NA, NA, 1, 1, 2, ...
## $ wtssall <dbl> 0.9569935, 0.4784968, 0.9569935, 1.9139870, 1.4354903, ...
## $ income_rc <fct> Gt $170000, Gt $50000, Gt $75000, Gt $170000, Gt $17000...
## $ agegrp <fct> Age 45-55, Age 55-65, Age 65+, Age 35-45, Age 45-55, Ag...
## $ ageq <fct> Age 34-49, Age 49-62, Age 62+, Age 34-49, Age 49-62, Ag...
## $ siblings <fct> 2, 3, 3, 3, 2, 2, 2, 6+, 5, 1, 4, 4, 3, 6+, 0, 1, 3, 6+...
## $ kids <fct> 3, 0, 2, 4+, 2, 2, 2, 3, 3, 4+, 4+, 4+, 3, 4+, 4+, 2, 4...
## $ religion <fct> None, None, Catholic, Catholic, None, None, None, Catho...
## $ bigregion <fct> Northeast, Northeast, Northeast, Northeast, Northeast, ...
## $ partners_rc <fct> NA, 1, 1, NA, 1, 1, NA, 1, NA, 3, 1, NA, 1, NA, 0, 1, 0...
## $ obama <dbl> 0, 1, 0, 0, 1, 1, NA, NA, NA, 0, NA, NA, 1, 1, 0, 1, 0,...
p <- ggplot(data = gss_sm, mapping = aes(x = age, y = childs))
p + geom_point(alpha = 0.2) + geom_smooth() + facet_grid(sex ~ race)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 18 rows containing non-finite values (stat_smooth).
## Warning: Removed 18 rows containing missing values (geom_point).
p <- ggplot(data = gss_sm, mapping = aes(x = age, y = childs))
p + geom_point(alpha = 0.2) + geom_smooth() + facet_grid(sex ~ race + degree)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 18 rows containing non-finite values (stat_smooth).
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 62.87
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 2.13
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 582.26
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : span too small. fewer
## data values than degrees of freedom.
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used at
## 62.87
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius 2.13
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : reciprocal condition
## number 0
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : There are other near
## singularities as well. 582.26
## Warning: Removed 18 rows containing missing values (geom_point).
## Warning in max(ids, na.rm = TRUE): brak argumentów w max; zwracanie wartości -
## Inf
p <- ggplot(data = gss_sm, mapping = aes(x = bigregion))
p + geom_bar()
p <- ggplot(data = gss_sm, mapping = aes(x = bigregion))
p + geom_bar(mapping = aes(y = ..prop..))
p <- ggplot(data = gss_sm, mapping = aes(x = bigregion))
p + geom_bar(mapping = aes(y = ..prop.., group = 1))
table(gss_sm$religion)
##
## Protestant Catholic Jewish None Other
## 1371 649 51 619 159
p <- ggplot(data = gss_sm, mapping = aes(x = religion, color = religion))
p + geom_bar()
p <- ggplot(data = gss_sm, mapping = aes(x = religion, fill = religion))
p + geom_bar() + guides(fill = FALSE)
p <- ggplot(data = gss_sm, mapping = aes(x = bigregion, fill = religion))
p + geom_bar()
p <- ggplot(data = gss_sm, mapping = aes(x = bigregion, fill = religion))
p + geom_bar(position = "fill")
p <- ggplot(data = gss_sm, mapping = aes(x = bigregion, fill = religion))
p + geom_bar(position = "dodge", mapping = aes(y = ..prop..))
p <- ggplot(data = gss_sm, mapping = aes(x = bigregion, fill = religion))
p + geom_bar(position = "dodge", mapping = aes(y = ..prop.., group = religion))
p <- ggplot(data = gss_sm, mapping = aes(x = religion))
p + geom_bar(position = "dodge", mapping = aes(y = ..prop.., group = bigregion)) + facet_wrap(~ bigregion, ncol = 2)
p <- ggplot(data = midwest, mapping = aes(x = area))
p + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
p <- ggplot(data = midwest, mapping = aes(x = area))
p + geom_histogram(bins = 10)
oh_wi <- c("OH", "WI")
p <- ggplot(data = subset(midwest, subset = state %in% oh_wi), mapping = aes(x = percollege, fill = state))
p + geom_histogram(alpha = 0.4, bins = 20)
p <- ggplot(data = midwest, mapping = aes(x = area))
p + geom_density()
p <- ggplot(data = midwest, mapping = aes(x = area, fill = state, color = state))
p + geom_density(alpha = 0.3)
p <- ggplot(data = midwest, mapping = aes(x = area, color = state))
p + geom_line(stat="density")
p <- ggplot(data = subset(midwest, subset = state %in% oh_wi), mapping = aes(x = area, fill = state, color = state))
p + geom_density(alpha = 0.3, mapping = aes(y = ..scaled..))
p <- ggplot(data = subset(midwest, subset = state %in% oh_wi), mapping = aes(x = area, fill = state, color = state))
p + geom_density(alpha = 0.3, mapping = aes(y = ..count..))
titanic
## fate sex n percent
## 1 perished male 1364 62.0
## 2 perished female 126 5.7
## 3 survived male 367 16.7
## 4 survived female 344 15.6
p <- ggplot(data = titanic, mapping = aes(x = fate, y = percent, fill = sex))
p + geom_bar(position = "dodge", stat = "identity") + theme(legend.position = "top")
oecd_sum
## # A tibble: 57 x 5
## # Groups: year [57]
## year other usa diff hi_lo
## <int> <dbl> <dbl> <dbl> <chr>
## 1 1960 68.6 69.9 1.3 Below
## 2 1961 69.2 70.4 1.2 Below
## 3 1962 68.9 70.2 1.30 Below
## 4 1963 69.1 70 0.9 Below
## 5 1964 69.5 70.3 0.800 Below
## 6 1965 69.6 70.3 0.7 Below
## 7 1966 69.9 70.3 0.400 Below
## 8 1967 70.1 70.7 0.6 Below
## 9 1968 70.1 70.4 0.3 Below
## 10 1969 70.1 70.6 0.5 Below
## # ... with 47 more rows
p <- ggplot(data = oecd_sum, mapping = aes(x = year, y = diff, fill = hi_lo))
p + geom_col() + guides(fill = FALSE) +
labs(x = NULL, y = "Difference in Years", title = "The US Life Expectancy Gap",
subtitle = "Difference between US and OECD average life expectancies, 1960-2015",
caption = "Data: OECD. After a chart by Christopher Ingraham, Washington Post, December 27th 2017.")
## Warning: Removed 1 rows containing missing values (position_stack).
p <- ggplot(data = oecd_sum, mapping = aes(x = year, y = diff, fill = hi_lo))
p + geom_bar(position = "identity", stat = "identity") + guides(fill = FALSE) +
labs(x = NULL, y = "Difference in Years", title = "The US Life Expectancy Gap",
subtitle = "Difference between US and OECD average life expectancies, 1960-2015",
caption = "Data: OECD. After a chart by Christopher Ingraham, Washington Post, December 27th 2017.")
## Warning: Removed 1 rows containing missing values (geom_bar).
p <- ggplot(data = gapminder, mapping = aes(x = pop, y = gdpPercap))
p + geom_point(mapping = aes(color = continent)) + scale_x_log10() + scale_y_log10(labels = scales::dollar) + facet_wrap(~ year)
p <- ggplot(data = gapminder, mapping = aes(x = year, y = pop))
p_out <- p + geom_line(aes(group = country)) + scale_y_log10() + facet_wrap(~country)
ggsave(here("figures", "pop_vs_year_countries.pdf"), plot = p_out,
height = 12, width = 20, units = "in")
p <- ggplot(data = gapminder, mapping = aes(x = year, y = gdpPercap))
p_out <- p + geom_line(aes(group = country)) + scale_y_log10() + facet_wrap(~country)
ggsave(here("figures", "dgpPercap_vs_year_countries.pdf"), plot = p_out,
height = 12, width = 20, units = "in")
p <- ggplot(data = gss_sm, mapping = aes(x = age, y = childs))
p + geom_point(alpha = 0.2) + geom_smooth() + facet_grid(sex ~ race)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 18 rows containing non-finite values (stat_smooth).
## Warning: Removed 18 rows containing missing values (geom_point).
p <- ggplot(data = gss_sm, mapping = aes(x = age, y = childs))
p + geom_point(alpha = 0.2) + geom_smooth() + facet_grid(~ sex + race)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 18 rows containing non-finite values (stat_smooth).
## Warning: Removed 18 rows containing missing values (geom_point).
p <- ggplot(data = gss_sm, mapping = aes(x = age, y = childs))
p + geom_point(alpha = 0.2) + geom_smooth() + facet_wrap(~ sex + race)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 18 rows containing non-finite values (stat_smooth).
## Warning: Removed 18 rows containing missing values (geom_point).
p <- ggplot(data = midwest, mapping = aes(x = area))
p + geom_freqpoly(size = 1.5,)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
p <- ggplot(data = midwest, mapping = aes(x = area))
p + geom_freqpoly(size = 1.5, bins = 10)
oh_wi <- c("OH", "WI")
p <- ggplot(data = subset(midwest, subset = state %in% oh_wi), mapping = aes(x = percollege, color = state))
p + geom_freqpoly(alpha = 0.4, size = 1.5, bins = 20)
p <- ggplot(data = gapminder, mapping = aes(x = lifeExp, y = gdpPercap))
p + geom_bin2d(bins = c(20, 50)) # binwidth = 5
p <- ggplot(data = midwest, mapping = aes(x = percollege, y = percbelowpoverty))
p + geom_density_2d()
p <- ggplot(data = midwest, mapping = aes(x = percollege, y = percbelowpoverty))
p + geom_density_2d() + geom_point()
rel_by_region <- gss_sm %>%
group_by(bigregion, religion) %>%
summarize(N = n()) %>%
mutate(freq = N / sum(N),
pct = round(freq*100, 0))
## `summarise()` has grouped output by 'bigregion'. You can override using the `.groups` argument.
rel_by_region
## # A tibble: 24 x 5
## # Groups: bigregion [4]
## bigregion religion N freq pct
## <fct> <fct> <int> <dbl> <dbl>
## 1 Northeast Protestant 158 0.324 32
## 2 Northeast Catholic 162 0.332 33
## 3 Northeast Jewish 27 0.0553 6
## 4 Northeast None 112 0.230 23
## 5 Northeast Other 28 0.0574 6
## 6 Northeast <NA> 1 0.00205 0
## 7 Midwest Protestant 325 0.468 47
## 8 Midwest Catholic 172 0.247 25
## 9 Midwest Jewish 3 0.00432 0
## 10 Midwest None 157 0.226 23
## # ... with 14 more rows
rel_by_region %>% group_by(bigregion) %>% summarize(total = sum(pct))
## # A tibble: 4 x 2
## bigregion total
## * <fct> <dbl>
## 1 Northeast 100
## 2 Midwest 101
## 3 South 100
## 4 West 101
p <- ggplot(rel_by_region, aes(x = bigregion, y = pct, fill = religion))
p + geom_col(position = "dodge") +
labs(x = "Region", y = "Percent", fill = "Religion") +
theme(legend.position = "top")
p <- ggplot(rel_by_region, aes(x = bigregion, y = pct, fill = religion))
p + geom_col(position = "dodge2") +
labs(x = "Region", y = "Percent", fill = "Religion") +
theme(legend.position = "top")
p <- ggplot(rel_by_region, aes(x = religion, y = pct, fill = religion))
p + geom_col(position = "dodge2") +
labs(x = NULL, y = "Percent", fill = "Religion") +
guides(fill = FALSE) +
coord_flip() +
facet_grid(~ bigregion)
head(organdata)
## # A tibble: 6 x 21
## country year donors pop pop_dens gdp gdp_lag health health_lag
## <chr> <date> <dbl> <int> <dbl> <int> <int> <dbl> <dbl>
## 1 Austra~ NA NA 17065 0.220 16774 16591 1300 1224
## 2 Austra~ 1991-01-01 12.1 17284 0.223 17171 16774 1379 1300
## 3 Austra~ 1992-01-01 12.4 17495 0.226 17914 17171 1455 1379
## 4 Austra~ 1993-01-01 12.5 17667 0.228 18883 17914 1540 1455
## 5 Austra~ 1994-01-01 10.2 17855 0.231 19849 18883 1626 1540
## 6 Austra~ 1995-01-01 10.2 18072 0.233 21079 19849 1737 1626
## # ... with 12 more variables: pubhealth <dbl>, roads <dbl>, cerebvas <int>,
## # assault <int>, external <int>, txp_pop <dbl>, world <chr>, opt <chr>,
## # consent_law <chr>, consent_practice <chr>, consistent <chr>, ccode <chr>
organdata %>% select(1:6) %>% sample_n(size = 10)
## # A tibble: 10 x 6
## country year donors pop pop_dens gdp
## <chr> <date> <dbl> <int> <dbl> <int>
## 1 Ireland 1991-01-01 19 3534 5.03 13495
## 2 United Kingdom 2000-01-01 13.2 58817 24.2 25271
## 3 Belgium 2001-01-01 22.2 10287 31.1 27113
## 4 Finland 2002-01-01 17.1 5201 1.54 26616
## 5 United Kingdom 1999-01-01 12.1 58635 24.1 24086
## 6 Switzerland 1992-01-01 14.8 6875 16.7 25135
## 7 Ireland 2001-01-01 18.2 3863 5.50 29703
## 8 United States 1992-01-01 17.6 256514 2.66 24411
## 9 Ireland 1997-01-01 20.9 3673 5.23 22017
## 10 Netherlands 1995-01-01 15.2 15459 37.2 21723
p <- ggplot(data = organdata, mapping = aes(x = year, y = donors))
p + geom_point()
## Warning: Removed 34 rows containing missing values (geom_point).
p <- ggplot(data = organdata, mapping = aes(x = year, y = donors))
p + geom_line(aes(group = country)) + facet_wrap(~ country)
## Warning: Removed 34 row(s) containing missing values (geom_path).
p <- ggplot(data = organdata, mapping = aes(x = country, y = donors))
p + geom_boxplot() # outlier.shape, outlier.color
## Warning: Removed 34 rows containing non-finite values (stat_boxplot).
p <- ggplot(data = organdata, mapping = aes(x = country, y = donors))
p + geom_boxplot() + coord_flip()
## Warning: Removed 34 rows containing non-finite values (stat_boxplot).
p <- ggplot(data = organdata, mapping = aes(x = reorder(country, donors, na.rm = TRUE), y = donors))
p + geom_boxplot() + labs(x = NULL) + coord_flip()
## Warning: Removed 34 rows containing non-finite values (stat_boxplot).
p <- ggplot(data = organdata, mapping = aes(x = reorder(country, donors, na.rm = TRUE), y = donors))
p + geom_violin() + labs(x = NULL) + coord_flip()
## Warning: Removed 34 rows containing non-finite values (stat_ydensity).
p <- ggplot(data = organdata, mapping = aes(x = reorder(country, donors, sd, na.rm = TRUE), y = donors))
p + geom_boxplot() + labs(x = NULL) + coord_flip()
## Warning: Removed 34 rows containing non-finite values (stat_boxplot).
p <- ggplot(data = organdata, mapping = aes(x = reorder(country, donors, median, na.rm = TRUE), y = donors))
p + geom_boxplot() + labs(x = NULL) + coord_flip()
## Warning: Removed 34 rows containing non-finite values (stat_boxplot).
p <- ggplot(data = organdata, mapping = aes(x = reorder(country, donors, na.rm = TRUE), y = donors, fill = world))
p + geom_boxplot() + labs(x = NULL) + coord_flip() +
theme(legend.position = "top")
## Warning: Removed 34 rows containing non-finite values (stat_boxplot).
p <- ggplot(data = organdata, mapping = aes(x = reorder(country, donors, na.rm = TRUE), y = donors, color = world))
p + geom_point() + labs(x = NULL) + coord_flip() +
theme(legend.position = "top")
## Warning: Removed 34 rows containing missing values (geom_point).
p <- ggplot(data = organdata, mapping = aes(x = reorder(country, donors, na.rm = TRUE), y = donors, color = world))
p + geom_jitter() + labs(x = NULL) + coord_flip() +
theme(legend.position = "top")
## Warning: Removed 34 rows containing missing values (geom_point).
p <- ggplot(data = organdata, mapping = aes(x = reorder(country, donors, na.rm = TRUE), y = donors, color = world))
p + geom_jitter(position = position_jitter(width = 0.15)) + labs(x = NULL) + coord_flip() +
theme(legend.position = "top")
## Warning: Removed 34 rows containing missing values (geom_point).
p <- ggplot(data = organdata, mapping = aes(x = reorder(country, donors, na.rm = TRUE), y = donors, color = world))
p + geom_jitter(position = position_jitter(height = 0.15)) + labs(x = NULL) + coord_flip() +
theme(legend.position = "top")
## Warning: Removed 34 rows containing missing values (geom_point).
glimpse(organdata)
## Rows: 238
## Columns: 21
## $ country <chr> "Australia", "Australia", "Australia", "Australia"...
## $ year <date> NA, 1991-01-01, 1992-01-01, 1993-01-01, 1994-01-0...
## $ donors <dbl> NA, 12.09, 12.35, 12.51, 10.25, 10.18, 10.59, 10.2...
## $ pop <int> 17065, 17284, 17495, 17667, 17855, 18072, 18311, 1...
## $ pop_dens <dbl> 0.2204433, 0.2232723, 0.2259980, 0.2282198, 0.2306...
## $ gdp <int> 16774, 17171, 17914, 18883, 19849, 21079, 21923, 2...
## $ gdp_lag <int> 16591, 16774, 17171, 17914, 18883, 19849, 21079, 2...
## $ health <dbl> 1300, 1379, 1455, 1540, 1626, 1737, 1846, 1948, 20...
## $ health_lag <dbl> 1224, 1300, 1379, 1455, 1540, 1626, 1737, 1846, 19...
## $ pubhealth <dbl> 4.8, 5.4, 5.4, 5.4, 5.4, 5.5, 5.6, 5.7, 5.9, 6.1, ...
## $ roads <dbl> 136.59537, 122.25179, 112.83224, 110.54508, 107.98...
## $ cerebvas <int> 682, 647, 630, 611, 631, 592, 576, 525, 516, 493, ...
## $ assault <int> 21, 19, 17, 18, 17, 16, 17, 17, 16, 15, 16, 15, 14...
## $ external <int> 444, 425, 406, 376, 387, 371, 395, 385, 410, 409, ...
## $ txp_pop <dbl> 0.9375916, 0.9257116, 0.9145470, 0.9056433, 0.8961...
## $ world <chr> "Liberal", "Liberal", "Liberal", "Liberal", "Liber...
## $ opt <chr> "In", "In", "In", "In", "In", "In", "In", "In", "I...
## $ consent_law <chr> "Informed", "Informed", "Informed", "Informed", "I...
## $ consent_practice <chr> "Informed", "Informed", "Informed", "Informed", "I...
## $ consistent <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "...
## $ ccode <chr> "Oz", "Oz", "Oz", "Oz", "Oz", "Oz", "Oz", "Oz", "O...
by_country <- organdata %>% group_by(consent_law, country) %>%
summarize(donors_mean = mean(donors, na.rm = TRUE),
donors_sd = sd(donors, na.rm = TRUE),
gdp_mean = mean(gdp, na.rm = TRUE),
health_mean = mean(health, na.rm = TRUE),
roads_mean = mean(roads, na.rm = TRUE),
cerebvas_mean = mean(cerebvas, na.rm = TRUE))
## `summarise()` has grouped output by 'consent_law'. You can override using the `.groups` argument.
by_country
## # A tibble: 17 x 8
## # Groups: consent_law [2]
## consent_law country donors_mean donors_sd gdp_mean health_mean roads_mean
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Informed Austra~ 10.6 1.14 22179. 1958. 105.
## 2 Informed Canada 14.0 0.751 23711. 2272. 109.
## 3 Informed Denmark 13.1 1.47 23722. 2054. 102.
## 4 Informed Germany 13.0 0.611 22163. 2349. 113.
## 5 Informed Ireland 19.8 2.48 20824. 1480. 118.
## 6 Informed Nether~ 13.7 1.55 23013. 1993. 76.1
## 7 Informed United~ 13.5 0.775 21359. 1561. 67.9
## 8 Informed United~ 20.0 1.33 29212. 3988. 155.
## 9 Presumed Austria 23.5 2.42 23876. 1875. 150.
## 10 Presumed Belgium 21.9 1.94 22500. 1958. 155.
## 11 Presumed Finland 18.4 1.53 21019. 1615. 93.6
## 12 Presumed France 16.8 1.60 22603. 2160. 156.
## 13 Presumed Italy 11.1 4.28 21554. 1757 122.
## 14 Presumed Norway 15.4 1.11 26448. 2217. 70.0
## 15 Presumed Spain 28.1 4.96 16933 1289. 161.
## 16 Presumed Sweden 13.1 1.75 22415. 1951. 72.3
## 17 Presumed Switze~ 14.2 1.71 27233 2776. 96.4
## # ... with 1 more variable: cerebvas_mean <dbl>
by_year <- organdata %>% group_by(consent_law, year) %>%
summarize(donors_mean = mean(donors, na.rm = TRUE),
donors_sd = sd(donors, na.rm = TRUE),
gdp_mean = mean(gdp, na.rm = TRUE),
health_mean = mean(health, na.rm = TRUE),
roads_mean = mean(roads, na.rm = TRUE),
cerebvas_mean = mean(cerebvas, na.rm = TRUE))
## `summarise()` has grouped output by 'consent_law'. You can override using the `.groups` argument.
by_year
## # A tibble: 26 x 8
## # Groups: consent_law [2]
## consent_law year donors_mean donors_sd gdp_mean health_mean roads_mean
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Informed 1991-01-01 14.7 2.58 18160. 1642. 121.
## 2 Informed 1992-01-01 15.2 2.44 19001. 1752. 113.
## 3 Informed 1993-01-01 15.0 1.99 19657. 1829. 111.
## 4 Informed 1994-01-01 14.5 3.49 20724. 1912. 108.
## 5 Informed 1995-01-01 15.6 4.60 21789. 1996. 110.
## 6 Informed 1996-01-01 14.6 2.85 22724. 2080. 105.
## 7 Informed 1997-01-01 14.7 3.82 24084. 2169. 102.
## 8 Informed 1998-01-01 14.8 4.86 25214. 2273. 97.9
## 9 Informed 1999-01-01 14.1 3.99 26458 2404. 96.2
## 10 Informed 2000-01-01 14.4 3.52 27841. 2532. 95.7
## # ... with 16 more rows, and 1 more variable: cerebvas_mean <dbl>
by_country <- organdata %>% group_by(consent_law, country) %>%
summarize_if(is.numeric, funs(mean, sd), na.rm = TRUE) %>%
ungroup()
## Warning: `funs()` is deprecated as of dplyr 0.8.0.
## Please use a list of either functions or lambdas:
##
## # Simple named list:
## list(mean = mean, median = median)
##
## # Auto named with `tibble::lst()`:
## tibble::lst(mean, median)
##
## # Using lambdas
## list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
by_country
## # A tibble: 17 x 28
## consent_law country donors_mean pop_mean pop_dens_mean gdp_mean gdp_lag_mean
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Informed Austra~ 10.6 18318. 0.237 22179. 21779.
## 2 Informed Canada 14.0 29608. 0.297 23711. 23353.
## 3 Informed Denmark 13.1 5257. 12.2 23722. 23275
## 4 Informed Germany 13.0 80255. 22.5 22163. 21938.
## 5 Informed Ireland 19.8 3674. 5.23 20824. 20154.
## 6 Informed Nether~ 13.7 15548. 37.4 23013. 22554.
## 7 Informed United~ 13.5 58187. 24.0 21359. 20962.
## 8 Informed United~ 20.0 269330. 2.80 29212. 28699.
## 9 Presumed Austria 23.5 7927. 9.45 23876. 23415.
## 10 Presumed Belgium 21.9 10153. 30.7 22500. 22096.
## 11 Presumed Finland 18.4 5112. 1.51 21019. 20763
## 12 Presumed France 16.8 58056. 10.5 22603. 22211.
## 13 Presumed Italy 11.1 57360. 19.0 21554. 21195.
## 14 Presumed Norway 15.4 4386. 1.35 26448. 25769.
## 15 Presumed Spain 28.1 39666. 7.84 16933 16584.
## 16 Presumed Sweden 13.1 8789. 1.95 22415. 22094
## 17 Presumed Switze~ 14.2 7037. 17.0 27233 26931.
## # ... with 21 more variables: health_mean <dbl>, health_lag_mean <dbl>,
## # pubhealth_mean <dbl>, roads_mean <dbl>, cerebvas_mean <dbl>,
## # assault_mean <dbl>, external_mean <dbl>, txp_pop_mean <dbl>,
## # donors_sd <dbl>, pop_sd <dbl>, pop_dens_sd <dbl>, gdp_sd <dbl>,
## # gdp_lag_sd <dbl>, health_sd <dbl>, health_lag_sd <dbl>, pubhealth_sd <dbl>,
## # roads_sd <dbl>, cerebvas_sd <dbl>, assault_sd <dbl>, external_sd <dbl>,
## # txp_pop_sd <dbl>
p <- ggplot(data = by_country, mapping = aes(x = donors_mean, y = reorder(country, donors_mean), color = consent_law))
p + geom_point(size = 3) +
labs(x = "Donor Procurement Rate", y = NULL, color = "Consent Law") +
theme(legend.position = "top")
p <- ggplot(data = by_country, mapping = aes(x = donors_mean, y = reorder(country, donors_mean)))
p + geom_point(size = 3) +
facet_wrap(~ consent_law, scales = "free_y", ncol = 1) +
labs(x = "Donor Procurement Rate", y = NULL)
p <- ggplot(data = by_country, mapping = aes(x = reorder(country, donors_mean), y = donors_mean))
p + geom_pointrange(mapping = aes(ymin = donors_mean - donors_sd, ymax = donors_mean + donors_sd)) +
labs(x = NULL, y = "Donor Procurement Rate") + coord_flip()
p <- ggplot(data = by_country, mapping = aes(x = roads_mean, y = donors_mean))
p + geom_point() + geom_text(mapping = aes(label = country))
p <- ggplot(data = by_country, mapping = aes(x = roads_mean, y = donors_mean))
p + geom_point() + geom_text(mapping = aes(label = country), hjust = 0)
p <- ggplot(data = by_country, mapping = aes(x = roads_mean, y = donors_mean))
p + geom_point() + geom_text(mapping = aes(label = country), hjust = 1)
library(ggrepel)
elections_historic %>% select(2:7)
## # A tibble: 49 x 6
## year winner win_party ec_pct popular_pct popular_margin
## <int> <chr> <chr> <dbl> <dbl> <dbl>
## 1 1824 John Quincy Adams D.-R. 0.322 0.309 -0.104
## 2 1828 Andrew Jackson Dem. 0.682 0.559 0.122
## 3 1832 Andrew Jackson Dem. 0.766 0.547 0.178
## 4 1836 Martin Van Buren Dem. 0.578 0.508 0.142
## 5 1840 William Henry Harrison Whig 0.796 0.529 0.0605
## 6 1844 James Polk Dem. 0.618 0.495 0.0145
## 7 1848 Zachary Taylor Whig 0.562 0.473 0.0479
## 8 1852 Franklin Pierce Dem. 0.858 0.508 0.0695
## 9 1856 James Buchanan Dem. 0.588 0.453 0.122
## 10 1860 Abraham Lincoln Rep. 0.594 0.396 0.101
## # ... with 39 more rows
p_title <- "Presidential Elections: Popular & Electoral College Margins"
p_subtitle <- "1824-2016"
p_caption <- "Data for 2016 are provisional."
x_label <- "Winner's share of Popular Vote"
y_label <- "Winner's share of Electoral College Votes"
p <- ggplot(elections_historic, aes(x = popular_pct, y = ec_pct, label = winner_label))
p + geom_hline(yintercept = 0.5, size = 1.4, color = "gray80") +
geom_vline(xintercept = 0.5, size = 1.4, color = "gray80") +
geom_abline(color = "gray80") +
geom_point() +
geom_text_repel() +
scale_x_continuous(labels = scales::percent) +
scale_y_continuous(labels = scales::percent) +
labs(x = x_label, y = y_label, title = p_title, subtitle = p_subtitle, caption = p_caption)
## Warning: ggrepel: 3 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
library(ggrepel)
p <- ggplot(data = by_country, mapping = aes(x = gdp_mean, y = health_mean))
p + geom_point() +
geom_text_repel(data = subset(by_country, gdp_mean > 25000),
mapping = aes(label = country))
p <- ggplot(data = by_country, mapping = aes(x = gdp_mean, y = health_mean))
p + geom_point() +
geom_text_repel(data = subset(by_country, gdp_mean > 25000 | health_mean < 1500 | country %in% "Belgium"),
mapping = aes(label = country))
organdata$ind <- organdata$ccode %in% c("Ita", "Spa") & organdata$year > 1998
p <- ggplot(data = organdata, mapping = aes(x = roads, y = donors, color = ind))
p + geom_point() +
geom_text_repel(data = subset(organdata, ind), mapping = aes(label = ccode)) +
guides(label = FALSE, color = FALSE)
## Warning: Removed 34 rows containing missing values (geom_point).
p <- ggplot(data = organdata, mapping = aes(x = roads, y = donors))
p + geom_point(aes(color = ind)) +
geom_text_repel(data = subset(organdata, ind), mapping = aes(label = ccode)) +
guides(label = FALSE, color = FALSE)
## Warning: Removed 34 rows containing missing values (geom_point).
p <- ggplot(data = organdata, mapping = aes(x = roads, y = donors))
p + geom_point() +
annotate(geom = "text", x = 91, y = 33, label = "A suprisingly high \n recovery rate.", hjust = 0)
## Warning: Removed 34 rows containing missing values (geom_point).
p <- ggplot(data = organdata, mapping = aes(x = roads, y = donors))
p + geom_point() +
annotate(geom = "rect", xmin = 125, xmax = 155, ymin = 30, ymax = 35, fill = "red", alpha = 0.2) +
annotate(geom = "text", x = 157, y = 33, label = "A suprisingly high \n recovery rate.", hjust = 0)
## Warning: Removed 34 rows containing missing values (geom_point).
# scale_<MAPPING>_<KIND>()
p <- ggplot(data = organdata, mapping = aes(x = roads, y = donors, color = world))
p + geom_point()
## Warning: Removed 34 rows containing missing values (geom_point).
p <- ggplot(data = organdata, mapping = aes(x = roads, y = donors, color = world))
p + geom_point() +
scale_x_log10() +
scale_y_continuous(breaks = c(5, 15, 25), labels = c("Five", "Fifteen", "Twenty Five"))
## Warning: Removed 34 rows containing missing values (geom_point).
p <- ggplot(data = organdata, mapping = aes(x = roads, y = donors, color = world))
p + geom_point() +
scale_color_discrete(labels = c("Corporatist", "Liberal", "Social Democratic", "Unclassified")) +
labs(x = "Road Deaths", y = "Donor Procurement", color = "Welfare State")
## Warning: Removed 34 rows containing missing values (geom_point).
p <- ggplot(data = organdata, mapping = aes(x = roads, y = donors, color = world))
p + geom_point() +
labs(x = "Road Deaths", y = "Donor Procurement") +
guides(color = FALSE)
## Warning: Removed 34 rows containing missing values (geom_point).
library(ggrepel)
glimpse(elections_historic)
## Rows: 49
## Columns: 19
## $ election <int> 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, ...
## $ year <int> 1824, 1828, 1832, 1836, 1840, 1844, 1848, 1852, 1856...
## $ winner <chr> "John Quincy Adams", "Andrew Jackson", "Andrew Jacks...
## $ win_party <chr> "D.-R.", "Dem.", "Dem.", "Dem.", "Whig", "Dem.", "Wh...
## $ ec_pct <dbl> 0.3218, 0.6820, 0.7657, 0.5782, 0.7959, 0.6182, 0.56...
## $ popular_pct <dbl> 0.3092, 0.5593, 0.5474, 0.5079, 0.5287, 0.4954, 0.47...
## $ popular_margin <dbl> -0.1044, 0.1225, 0.1781, 0.1420, 0.0605, 0.0145, 0.0...
## $ votes <int> 113142, 642806, 702735, 763291, 1275583, 1339570, 13...
## $ margin <int> -38221, 140839, 228628, 213384, 145938, 39413, 13788...
## $ runner_up <chr> "Andrew Jackson", "John Quincy Adams", "Henry Clay",...
## $ ru_part <chr> "D.-R.", "N. R.", "N. R.", "Whig", "Dem.", "Whig", "...
## $ turnout_pct <dbl> 0.269, 0.573, 0.570, 0.565, 0.803, 0.792, 0.728, 0.6...
## $ winner_lname <chr> "Adams", "Jackson", "Jackson", "Buren", "Harrison", ...
## $ winner_label <chr> "Adams 1824", "Jackson 1828", "Jackson 1832", "Buren...
## $ ru_lname <chr> "Jackson", "Adams", "Clay", "Harrison", "Buren", "Cl...
## $ ru_label <chr> "Jackson 1824", "Adams 1828", "Clay 1832", "Harrison...
## $ two_term <lgl> FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE...
## $ ec_votes <dbl> 84, 178, 219, 170, 234, 170, 163, 254, 174, 180, 212...
## $ ec_denom <dbl> 261, 261, 286, 294, 294, 275, 290, 296, 296, 303, 23...
p_title <- "Presidential Elections: Popular & Electoral College Margins"
p_subtitle <- "1824-2016"
p_caption <- "Data for 2016 are provisional."
x_label <- "Winner's share of Popular Vote"
y_label <- "Winner's share of Electoral College Votes"
p <- ggplot(elections_historic, aes(x = popular_pct, y = ec_pct, label = winner_label))
p + geom_hline(yintercept = 0.5, size = 1.4, color = "gray80") +
geom_vline(xintercept = 0.5, size = 1.4, color = "gray80") +
geom_point(aes(color = ru_part)) +
geom_text_repel(data = subset(elections_historic, year >= 1992),
mapping = aes(color = ru_part),
show.legend = FALSE) +
annotate(geom = "rect", xmin = 0.3, xmax = 0.5, ymin = 0.5, ymax = 1, fill = "steelblue", alpha = 0.1) +
scale_x_continuous(labels = scales::percent) +
scale_y_continuous(labels = scales::percent) +
labs(x = x_label, y = y_label, title = p_title, subtitle = p_subtitle, caption = p_caption)
p <- ggplot(data = elections_historic, mapping = aes(x = reorder(year, popular_pct), y = popular_pct))
p + geom_point(size = 3, aes(color = ru_part)) +
labs(x = "Donor Procurement Rate", y = NULL) +
scale_y_continuous(labels = scales::percent) +
coord_flip()
life_by_continent <- gapminder %>% filter(year == 2007) %>% group_by(continent) %>% summarize(m = mean(lifeExp))
life_by_continent
## # A tibble: 5 x 2
## continent m
## * <fct> <dbl>
## 1 Africa 54.8
## 2 Americas 73.6
## 3 Asia 70.7
## 4 Europe 77.6
## 5 Oceania 80.7
p <- ggplot(data = life_by_continent, mapping = aes(x = reorder(continent, m), y = m))
p + geom_col() +
labs(x = NULL, y = "Life expectancy in years, 2007") +
coord_flip()
p <- ggplot(data = life_by_continent, mapping = aes(x = reorder(continent, m), y = m))
p + geom_point(size = 5) +
labs(x = NULL, y = "Life expectancy in years, 2007") +
coord_flip()
degree_by_race <- gss_sm %>% group_by(race, degree) %>% summarize(N = n()) %>% mutate(pct = round(N/sum(N)*100, 0))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
degree_by_race
## # A tibble: 18 x 4
## # Groups: race [3]
## race degree N pct
## <fct> <fct> <int> <dbl>
## 1 White Lt High School 197 9
## 2 White High School 1057 50
## 3 White Junior College 166 8
## 4 White Bachelor 426 20
## 5 White Graduate 250 12
## 6 White <NA> 4 0
## 7 Black Lt High School 60 12
## 8 Black High School 292 60
## 9 Black Junior College 33 7
## 10 Black Bachelor 71 14
## 11 Black Graduate 31 6
## 12 Black <NA> 3 1
## 13 Other Lt High School 71 26
## 14 Other High School 112 40
## 15 Other Junior College 17 6
## 16 Other Bachelor 39 14
## 17 Other Graduate 37 13
## 18 Other <NA> 1 0
degree_by_race %>% group_by(race) %>% summarize(total = sum(pct))
## # A tibble: 3 x 2
## race total
## * <fct> <dbl>
## 1 White 99
## 2 Black 100
## 3 Other 99
degree_by_sex <- gss_sm %>% group_by(sex, degree) %>% summarize(N = n()) %>% mutate(pct = round(N/sum(N)*100, 0))
## `summarise()` has grouped output by 'sex'. You can override using the `.groups` argument.
degree_by_sex
## # A tibble: 12 x 4
## # Groups: sex [2]
## sex degree N pct
## <fct> <fct> <int> <dbl>
## 1 Male Lt High School 147 12
## 2 Male High School 662 52
## 3 Male Junior College 89 7
## 4 Male Bachelor 243 19
## 5 Male Graduate 132 10
## 6 Male <NA> 3 0
## 7 Female Lt High School 181 11
## 8 Female High School 799 50
## 9 Female Junior College 127 8
## 10 Female Bachelor 293 18
## 11 Female Graduate 186 12
## 12 Female <NA> 5 0
degree_by_sex %>% group_by(sex) %>% summarize(total = sum(pct))
## # A tibble: 2 x 2
## sex total
## * <fct> <dbl>
## 1 Male 100
## 2 Female 99
degree_by_region <- gss_sm %>% group_by(region, degree) %>% summarize(N = n()) %>% mutate(pct = round(N/sum(N)*100, 0))
## `summarise()` has grouped output by 'region'. You can override using the `.groups` argument.
degree_by_region
## # A tibble: 49 x 4
## # Groups: region [9]
## region degree N pct
## <fct> <fct> <int> <dbl>
## 1 New England Lt High School 13 7
## 2 New England High School 77 44
## 3 New England Junior College 19 11
## 4 New England Bachelor 39 22
## 5 New England Graduate 27 15
## 6 Middle Atlantic Lt High School 25 8
## 7 Middle Atlantic High School 160 51
## 8 Middle Atlantic Junior College 21 7
## 9 Middle Atlantic Bachelor 59 19
## 10 Middle Atlantic Graduate 47 15
## # ... with 39 more rows
degree_by_region %>% group_by(region) %>% summarize(total = sum(pct))
## # A tibble: 9 x 2
## region total
## * <fct> <dbl>
## 1 New England 99
## 2 Middle Atlantic 100
## 3 E. Nor. Central 100
## 4 W. Nor. Central 100
## 5 South Atlantic 100
## 6 E. Sou. Central 100
## 7 W. Sou. Central 100
## 8 Mountain 100
## 9 Pacific 100
gss_sm %>% group_by(degree) %>% summarize(mean = mean(childs, na.rm = TRUE), median = median(childs, na.rm = TRUE))
## # A tibble: 6 x 3
## degree mean median
## * <fct> <dbl> <dbl>
## 1 Lt High School 2.81 3
## 2 High School 1.86 2
## 3 Junior College 1.77 2
## 4 Bachelor 1.45 1
## 5 Graduate 1.52 2
## 6 <NA> 3.6 4
str(gapminder)
## tibble [1,704 x 6] (S3: tbl_df/tbl/data.frame)
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ year : int [1:1704] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ lifeExp : num [1:1704] 28.8 30.3 32 34 36.1 ...
## $ pop : int [1:1704] 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
## $ gdpPercap: num [1:1704] 779 821 853 836 740 ...
p <- ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp, group = year))
p + geom_boxplot()
p <- ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp, group = year))
p + geom_boxplot() + facet_wrap(~ continent)
p <- ggplot(data = gapminder, mapping = aes(x = year, y = pop, group = year))
p + geom_boxplot() + scale_y_log10() + facet_wrap(~ continent)
life_by_year <- gapminder %>% group_by(continent, year)
life_by_year # this produce improper result
## # A tibble: 1,704 x 6
## # Groups: continent, year [60]
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
## 7 Afghanistan Asia 1982 39.9 12881816 978.
## 8 Afghanistan Asia 1987 40.8 13867957 852.
## 9 Afghanistan Asia 1992 41.7 16317921 649.
## 10 Afghanistan Asia 1997 41.8 22227415 635.
## # ... with 1,694 more rows
p <- ggplot(data = life_by_year, mapping = aes(x = year, y = lifeExp), group = year)
p + geom_boxplot() + facet_wrap(~ continent)
## Warning: Continuous x aesthetic -- did you forget aes(group=...)?
p <- ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp, group = year))
p + geom_boxplot(notch = TRUE) + facet_wrap(~ continent)
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
p <- ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp, group = year))
p + geom_boxplot(notch = TRUE, varwidth = TRUE)
p <- ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp, group = year))
p + geom_violin()
by_country <- organdata %>% group_by(consent_law, country) %>%
summarize_if(is.numeric, funs(mean, sd), na.rm = TRUE) %>%
ungroup()
p <- ggplot(data = by_country, mapping = aes(x = reorder(country, donors_mean), y = donors_mean))
p + geom_pointrange(mapping = aes(ymin = donors_mean - donors_sd, ymax = donors_mean + donors_sd)) +
labs(x = NULL, y = "Donor Procurement Rate") + coord_flip()
p + geom_linerange(mapping = aes(ymin = donors_mean - donors_sd, ymax = donors_mean + donors_sd)) +
labs(x = NULL, y = "Donor Procurement Rate") + coord_flip()
p + geom_crossbar(mapping = aes(ymin = donors_mean - donors_sd, ymax = donors_mean + donors_sd)) +
labs(x = NULL, y = "Donor Procurement Rate") + coord_flip()
p + geom_errorbar(mapping = aes(ymin = donors_mean - donors_sd, ymax = donors_mean + donors_sd)) +
labs(x = NULL, y = "Donor Procurement Rate") + coord_flip()
p <- ggplot(data = gapminder, mapping = aes(x = log(gdpPercap), y = lifeExp))
p + geom_point(alpha = 0.1) +
geom_smooth(color = "tomato", fill = "tomato", method = MASS::rlm) +
geom_smooth(color = "steelblue", fill = "steelblue", method = "lm")
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
p + geom_point(alpha = 0.1) +
geom_smooth(color = "tomato", size = 1.2, method = "lm", formula = y ~ splines::bs(x, 3), se = FALSE)
p + geom_point(alpha = 0.1) +
geom_quantile(color = "tomato", size = 1.2, method = "rqss", lambda = 1, quantiles = c(0.2, 0.5, 0.85))
## Smoothing formula not specified. Using: y ~ qss(x, lambda = 1)
model_colors <- RColorBrewer::brewer.pal(3, "Set1")
model_colors
## [1] "#E41A1C" "#377EB8" "#4DAF4A"
p0 <- ggplot(data = gapminder, mapping = aes(x = log(gdpPercap), y = lifeExp))
p1 <- p0 + geom_point(alpha = 0.2) +
geom_smooth(method = "lm", aes(color = "OLS", fill = "OLS")) +
geom_smooth(method = "lm", formula = y ~ splines::bs(x, df = 3), aes(color = "Cubic Spline", fill = "Cubic Spline")) +
geom_smooth(method = "loess", aes(color = "LOESS", fill = "LOESS"))
p1 + scale_color_manual(name = "Models", values = model_colors) +
scale_fill_manual(name = "Models", values = model_colors) +
theme(legend.position = "top")
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
gapminder
## # A tibble: 1,704 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
## 7 Afghanistan Asia 1982 39.9 12881816 978.
## 8 Afghanistan Asia 1987 40.8 13867957 852.
## 9 Afghanistan Asia 1992 41.7 16317921 649.
## 10 Afghanistan Asia 1997 41.8 22227415 635.
## # ... with 1,694 more rows
str(gapminder)
## tibble [1,704 x 6] (S3: tbl_df/tbl/data.frame)
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ year : int [1:1704] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ lifeExp : num [1:1704] 28.8 30.3 32 34 36.1 ...
## $ pop : int [1:1704] 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
## $ gdpPercap: num [1:1704] 779 821 853 836 740 ...
summary(gapminder)
## country continent year lifeExp
## Afghanistan: 12 Africa :624 Min. :1952 Min. :23.60
## Albania : 12 Americas:300 1st Qu.:1966 1st Qu.:48.20
## Algeria : 12 Asia :396 Median :1980 Median :60.71
## Angola : 12 Europe :360 Mean :1980 Mean :59.47
## Argentina : 12 Oceania : 24 3rd Qu.:1993 3rd Qu.:70.85
## Australia : 12 Max. :2007 Max. :82.60
## (Other) :1632
## pop gdpPercap
## Min. :6.001e+04 Min. : 241.2
## 1st Qu.:2.794e+06 1st Qu.: 1202.1
## Median :7.024e+06 Median : 3531.8
## Mean :2.960e+07 Mean : 7215.3
## 3rd Qu.:1.959e+07 3rd Qu.: 9325.5
## Max. :1.319e+09 Max. :113523.1
##
out <- lm(formula = lifeExp ~ gdpPercap + pop + continent, data = gapminder)
summary(out)
##
## Call:
## lm(formula = lifeExp ~ gdpPercap + pop + continent, data = gapminder)
##
## Residuals:
## Min 1Q Median 3Q Max
## -49.161 -4.486 0.297 5.110 25.175
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.781e+01 3.395e-01 140.819 < 2e-16 ***
## gdpPercap 4.495e-04 2.346e-05 19.158 < 2e-16 ***
## pop 6.570e-09 1.975e-09 3.326 0.000901 ***
## continentAmericas 1.348e+01 6.000e-01 22.458 < 2e-16 ***
## continentAsia 8.193e+00 5.712e-01 14.342 < 2e-16 ***
## continentEurope 1.747e+01 6.246e-01 27.973 < 2e-16 ***
## continentOceania 1.808e+01 1.782e+00 10.146 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.365 on 1697 degrees of freedom
## Multiple R-squared: 0.5821, Adjusted R-squared: 0.5806
## F-statistic: 393.9 on 6 and 1697 DF, p-value: < 2.2e-16
str(out)
## List of 13
## $ coefficients : Named num [1:7] 4.78e+01 4.50e-04 6.57e-09 1.35e+01 8.19 ...
## ..- attr(*, "names")= chr [1:7] "(Intercept)" "gdpPercap" "pop" "continentAmericas" ...
## $ residuals : Named num [1:1704] -27.6 -26.1 -24.5 -22.4 -20.3 ...
## ..- attr(*, "names")= chr [1:1704] "1" "2" "3" "4" ...
## $ effects : Named num [1:1704] -2455.1 311.1 42.6 101.1 -17.2 ...
## ..- attr(*, "names")= chr [1:1704] "(Intercept)" "gdpPercap" "pop" "continentAmericas" ...
## $ rank : int 7
## $ fitted.values: Named num [1:1704] 56.4 56.4 56.5 56.5 56.4 ...
## ..- attr(*, "names")= chr [1:1704] "1" "2" "3" "4" ...
## $ assign : int [1:7] 0 1 2 3 3 3 3
## $ qr :List of 5
## ..$ qr : num [1:1704, 1:7] -41.2795 0.0242 0.0242 0.0242 0.0242 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:1704] "1" "2" "3" "4" ...
## .. .. ..$ : chr [1:7] "(Intercept)" "gdpPercap" "pop" "continentAmericas" ...
## .. ..- attr(*, "assign")= int [1:7] 0 1 2 3 3 3 3
## .. ..- attr(*, "contrasts")=List of 1
## .. .. ..$ continent: chr "contr.treatment"
## ..$ qraux: num [1:7] 1.02 1.02 1 1.01 1.04 ...
## ..$ pivot: int [1:7] 1 2 3 4 5 6 7
## ..$ tol : num 1e-07
## ..$ rank : int 7
## ..- attr(*, "class")= chr "qr"
## $ df.residual : int 1697
## $ contrasts :List of 1
## ..$ continent: chr "contr.treatment"
## $ xlevels :List of 1
## ..$ continent: chr [1:5] "Africa" "Americas" "Asia" "Europe" ...
## $ call : language lm(formula = lifeExp ~ gdpPercap + pop + continent, data = gapminder)
## $ terms :Classes 'terms', 'formula' language lifeExp ~ gdpPercap + pop + continent
## .. ..- attr(*, "variables")= language list(lifeExp, gdpPercap, pop, continent)
## .. ..- attr(*, "factors")= int [1:4, 1:3] 0 1 0 0 0 0 1 0 0 0 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:4] "lifeExp" "gdpPercap" "pop" "continent"
## .. .. .. ..$ : chr [1:3] "gdpPercap" "pop" "continent"
## .. ..- attr(*, "term.labels")= chr [1:3] "gdpPercap" "pop" "continent"
## .. ..- attr(*, "order")= int [1:3] 1 1 1
## .. ..- attr(*, "intercept")= int 1
## .. ..- attr(*, "response")= int 1
## .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## .. ..- attr(*, "predvars")= language list(lifeExp, gdpPercap, pop, continent)
## .. ..- attr(*, "dataClasses")= Named chr [1:4] "numeric" "numeric" "numeric" "factor"
## .. .. ..- attr(*, "names")= chr [1:4] "lifeExp" "gdpPercap" "pop" "continent"
## $ model :'data.frame': 1704 obs. of 4 variables:
## ..$ lifeExp : num [1:1704] 28.8 30.3 32 34 36.1 ...
## ..$ gdpPercap: num [1:1704] 779 821 853 836 740 ...
## ..$ pop : int [1:1704] 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
## ..$ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## ..- attr(*, "terms")=Classes 'terms', 'formula' language lifeExp ~ gdpPercap + pop + continent
## .. .. ..- attr(*, "variables")= language list(lifeExp, gdpPercap, pop, continent)
## .. .. ..- attr(*, "factors")= int [1:4, 1:3] 0 1 0 0 0 0 1 0 0 0 ...
## .. .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. .. ..$ : chr [1:4] "lifeExp" "gdpPercap" "pop" "continent"
## .. .. .. .. ..$ : chr [1:3] "gdpPercap" "pop" "continent"
## .. .. ..- attr(*, "term.labels")= chr [1:3] "gdpPercap" "pop" "continent"
## .. .. ..- attr(*, "order")= int [1:3] 1 1 1
## .. .. ..- attr(*, "intercept")= int 1
## .. .. ..- attr(*, "response")= int 1
## .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## .. .. ..- attr(*, "predvars")= language list(lifeExp, gdpPercap, pop, continent)
## .. .. ..- attr(*, "dataClasses")= Named chr [1:4] "numeric" "numeric" "numeric" "factor"
## .. .. .. ..- attr(*, "names")= chr [1:4] "lifeExp" "gdpPercap" "pop" "continent"
## - attr(*, "class")= chr "lm"
out$coefficients
## (Intercept) gdpPercap pop continentAmericas
## 4.781408e+01 4.495058e-04 6.569823e-09 1.347594e+01
## continentAsia continentEurope continentOceania
## 8.192632e+00 1.747269e+01 1.808330e+01
#out$residuals
#out$fitted.values
out$df.residual
## [1] 1697
#out$model
min_gdp <- min(gapminder$gdpPercap)
max_gdp <- max(gapminder$gdpPercap)
med_pop <- median(gapminder$pop)
pred_df <- expand.grid(gdpPercap = seq(from = min_gdp, to = max_gdp, length.out = 100),
pop = med_pop, continent = c("Africa", "Americas", "Asia", "Europe", "Oceania"))
dim(pred_df)
## [1] 500 3
head(pred_df)
## gdpPercap pop continent
## 1 241.1659 7023596 Africa
## 2 1385.4282 7023596 Africa
## 3 2529.6905 7023596 Africa
## 4 3673.9528 7023596 Africa
## 5 4818.2150 7023596 Africa
## 6 5962.4773 7023596 Africa
pred_out <- predict(object = out, newdata = pred_df, interval = "predict")
head(pred_out)
## fit lwr upr
## 1 47.96863 31.54775 64.38951
## 2 48.48298 32.06231 64.90365
## 3 48.99733 32.57670 65.41797
## 4 49.51169 33.09092 65.93245
## 5 50.02604 33.60497 66.44711
## 6 50.54039 34.11885 66.96193
pred_df <- cbind(pred_df, pred_out)
head(pred_df)
## gdpPercap pop continent fit lwr upr
## 1 241.1659 7023596 Africa 47.96863 31.54775 64.38951
## 2 1385.4282 7023596 Africa 48.48298 32.06231 64.90365
## 3 2529.6905 7023596 Africa 48.99733 32.57670 65.41797
## 4 3673.9528 7023596 Africa 49.51169 33.09092 65.93245
## 5 4818.2150 7023596 Africa 50.02604 33.60497 66.44711
## 6 5962.4773 7023596 Africa 50.54039 34.11885 66.96193
p <- ggplot(data = subset(pred_df, continent %in% c("Europe", "Africa")),
aes(x = gdpPercap, y = fit, ymin = lwr, ymax = upr,
color = continent, fill = continent, group = continent))
p + geom_point(data = subset(gapminder, continent %in% c("Europe", "Africa")),
aes(x = gdpPercap, y = lifeExp, color = continent),
alpha = 0.5, inherit.aes = FALSE) +
geom_line() +
geom_ribbon(alpha = 0.2, color = FALSE) +
scale_x_log10(labels = scales::dollar)
library(broom)
out_comp <- tidy(out)
out_comp %>% round_df()
## # A tibble: 7 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 47.8 0.34 141. 0
## 2 gdpPercap 0 0 19.2 0
## 3 pop 0 0 3.33 0
## 4 continentAmericas 13.5 0.6 22.5 0
## 5 continentAsia 8.19 0.570 14.3 0
## 6 continentEurope 17.5 0.62 28.0 0
## 7 continentOceania 18.1 1.78 10.2 0
p <- ggplot(out_comp, mapping = aes(x = term, y = estimate))
p + geom_point() + coord_flip()
out_conf <- tidy(out, conf.int = TRUE)
out_conf %>% round_df()
## # A tibble: 7 x 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 47.8 0.34 141. 0 47.2 48.5
## 2 gdpPercap 0 0 19.2 0 0 0
## 3 pop 0 0 3.33 0 0 0
## 4 continentAmericas 13.5 0.6 22.5 0 12.3 14.6
## 5 continentAsia 8.19 0.570 14.3 0 7.07 9.31
## 6 continentEurope 17.5 0.62 28.0 0 16.2 18.7
## 7 continentOceania 18.1 1.78 10.2 0 14.6 21.6
out_conf <- subset(out_conf, term %nin% "(Intercept)")
out_conf$nicelabs <- prefix_strip(out_conf$term, "continent")
p <- ggplot(out_conf, mapping = aes(x = reorder(nicelabs, estimate),
y = estimate, ymin = conf.low, ymax = conf.high))
p + geom_pointrange() + coord_flip() + labs(x = NULL, y = "OLS Estimate")
out_aug <- augment(out)
head(out_aug) %>% round_df()
## # A tibble: 6 x 10
## lifeExp gdpPercap pop continent .fitted .resid .hat .sigma .cooksd
## <dbl> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 28.8 779. 8.43e6 Asia 56.4 -27.6 0 8.34 0.01
## 2 30.3 821. 9.24e6 Asia 56.4 -26.1 0 8.34 0
## 3 32 853. 1.03e7 Asia 56.5 -24.5 0 8.35 0
## 4 34.0 836. 1.15e7 Asia 56.5 -22.4 0 8.35 0
## 5 36.1 740. 1.31e7 Asia 56.4 -20.3 0 8.35 0
## 6 38.4 786. 1.49e7 Asia 56.5 -18.0 0 8.36 0
## # ... with 1 more variable: .std.resid <dbl>
out_aug <- augment(out, data = gapminder)
head(out_aug) %>% round_df()
## # A tibble: 6 x 12
## country continent year lifeExp pop gdpPercap .fitted .resid .hat .sigma
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghan~ Asia 1952 28.8 8.43e6 779. 56.4 -27.6 0 8.34
## 2 Afghan~ Asia 1957 30.3 9.24e6 821. 56.4 -26.1 0 8.34
## 3 Afghan~ Asia 1962 32 1.03e7 853. 56.5 -24.5 0 8.35
## 4 Afghan~ Asia 1967 34.0 1.15e7 836. 56.5 -22.4 0 8.35
## 5 Afghan~ Asia 1972 36.1 1.31e7 740. 56.4 -20.3 0 8.35
## 6 Afghan~ Asia 1977 38.4 1.49e7 786. 56.5 -18.0 0 8.36
## # ... with 2 more variables: .cooksd <dbl>, .std.resid <dbl>
p <- ggplot(data = out_aug, mapping = aes(x = .fitted, y = .resid))
p + geom_point()
glance(out) %>% round_df()
## # A tibble: 1 x 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.580 0.580 8.37 394. 0 6 -6034. 12084. 12127.
## # ... with 3 more variables: deviance <dbl>, df.residual <dbl>, nobs <dbl>
library(survival)
out_cph <- coxph(Surv(time, status) ~ age + sex, data = lung)
out_surv <- survfit(out_cph)
summary(out_cph)
## Call:
## coxph(formula = Surv(time, status) ~ age + sex, data = lung)
##
## n= 228, number of events= 165
##
## coef exp(coef) se(coef) z Pr(>|z|)
## age 0.017045 1.017191 0.009223 1.848 0.06459 .
## sex -0.513219 0.598566 0.167458 -3.065 0.00218 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## age 1.0172 0.9831 0.9990 1.0357
## sex 0.5986 1.6707 0.4311 0.8311
##
## Concordance= 0.603 (se = 0.025 )
## Likelihood ratio test= 14.12 on 2 df, p=9e-04
## Wald test = 13.47 on 2 df, p=0.001
## Score (logrank) test = 13.72 on 2 df, p=0.001
#summary(out_surv)
out_tidy <- tidy(out_surv)
p <- ggplot(data = out_tidy, mapping = aes(time, estimate))
p + geom_line() + geom_ribbon(mapping = aes(ymin = conf.low, ymax = conf.high), alpha = 0.2)
eu77 <- gapminder %>% filter(continent == "Europe", year == 1977)
fit <- lm(lifeExp ~ log(gdpPercap), data = eu77)
summary(fit)
##
## Call:
## lm(formula = lifeExp ~ log(gdpPercap), data = eu77)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.4956 -1.0306 0.0935 1.1755 3.7125
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29.489 7.161 4.118 0.000306 ***
## log(gdpPercap) 4.488 0.756 5.936 2.17e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.114 on 28 degrees of freedom
## Multiple R-squared: 0.5572, Adjusted R-squared: 0.5414
## F-statistic: 35.24 on 1 and 28 DF, p-value: 2.173e-06
out_le <- gapminder %>% group_by(continent, year) %>% nest()
out_le
## # A tibble: 60 x 3
## # Groups: continent, year [60]
## continent year data
## <fct> <int> <list>
## 1 Asia 1952 <tibble [33 x 4]>
## 2 Asia 1957 <tibble [33 x 4]>
## 3 Asia 1962 <tibble [33 x 4]>
## 4 Asia 1967 <tibble [33 x 4]>
## 5 Asia 1972 <tibble [33 x 4]>
## 6 Asia 1977 <tibble [33 x 4]>
## 7 Asia 1982 <tibble [33 x 4]>
## 8 Asia 1987 <tibble [33 x 4]>
## 9 Asia 1992 <tibble [33 x 4]>
## 10 Asia 1997 <tibble [33 x 4]>
## # ... with 50 more rows
out_le %>% filter(continent == "Europe" & year == 1977) %>% unnest()
## Warning: `cols` is now required when using unnest().
## Please use `cols = c(data)`
## # A tibble: 30 x 6
## # Groups: continent, year [1]
## continent year country lifeExp pop gdpPercap
## <fct> <int> <fct> <dbl> <int> <dbl>
## 1 Europe 1977 Albania 68.9 2509048 3533.
## 2 Europe 1977 Austria 72.2 7568430 19749.
## 3 Europe 1977 Belgium 72.8 9821800 19118.
## 4 Europe 1977 Bosnia and Herzegovina 69.9 4086000 3528.
## 5 Europe 1977 Bulgaria 70.8 8797022 7612.
## 6 Europe 1977 Croatia 70.6 4318673 11305.
## 7 Europe 1977 Czech Republic 70.7 10161915 14800.
## 8 Europe 1977 Denmark 74.7 5088419 20423.
## 9 Europe 1977 Finland 72.5 4738902 15605.
## 10 Europe 1977 France 73.8 53165019 18293.
## # ... with 20 more rows
fit_ols <- function(df) {
lm(lifeExp ~ log(gdpPercap), data = df)
}
fit_ols
## function(df) {
## lm(lifeExp ~ log(gdpPercap), data = df)
## }
fit_ols(df = gapminder)
##
## Call:
## lm(formula = lifeExp ~ log(gdpPercap), data = df)
##
## Coefficients:
## (Intercept) log(gdpPercap)
## -9.101 8.405
summary(fit_ols(gapminder))
##
## Call:
## lm(formula = lifeExp ~ log(gdpPercap), data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.778 -4.204 1.212 4.658 19.285
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.1009 1.2277 -7.413 1.93e-13 ***
## log(gdpPercap) 8.4051 0.1488 56.500 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.62 on 1702 degrees of freedom
## Multiple R-squared: 0.6522, Adjusted R-squared: 0.652
## F-statistic: 3192 on 1 and 1702 DF, p-value: < 2.2e-16
out_le <- gapminder %>% group_by(continent, year) %>% nest() %>%
mutate(model = map(data, fit_ols))
out_le
## # A tibble: 60 x 4
## # Groups: continent, year [60]
## continent year data model
## <fct> <int> <list> <list>
## 1 Asia 1952 <tibble [33 x 4]> <lm>
## 2 Asia 1957 <tibble [33 x 4]> <lm>
## 3 Asia 1962 <tibble [33 x 4]> <lm>
## 4 Asia 1967 <tibble [33 x 4]> <lm>
## 5 Asia 1972 <tibble [33 x 4]> <lm>
## 6 Asia 1977 <tibble [33 x 4]> <lm>
## 7 Asia 1982 <tibble [33 x 4]> <lm>
## 8 Asia 1987 <tibble [33 x 4]> <lm>
## 9 Asia 1992 <tibble [33 x 4]> <lm>
## 10 Asia 1997 <tibble [33 x 4]> <lm>
## # ... with 50 more rows
out_tidy <- gapminder %>% group_by(continent, year) %>% nest() %>%
mutate(model = map(data, fit_ols), tidied = map(model, tidy)) %>%
select(!(data:model)) %>%
unnest(tidied) %>%
filter(term %nin% "(Intercept)" & continent %nin% "Oceania") %>%
ungroup()
out_tidy %>% sample_n(5)
## # A tibble: 5 x 7
## continent year term estimate std.error statistic p.value
## <fct> <int> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Asia 1957 log(gdpPercap) 4.17 1.28 3.26 0.00271
## 2 Asia 2007 log(gdpPercap) 5.16 0.694 7.43 0.0000000226
## 3 Asia 1992 log(gdpPercap) 5.09 0.649 7.84 0.00000000760
## 4 Asia 1997 log(gdpPercap) 5.11 0.628 8.15 0.00000000335
## 5 Europe 1977 log(gdpPercap) 4.49 0.756 5.94 0.00000217
p <- ggplot(data = out_tidy, mapping = aes(x = year, y = estimate,
ymin = estimate - 2*std.error, ymax = estimate + 2*std.error,
group = continent, color = continent))
p + geom_pointrange(position = position_dodge(width = 1)) +
scale_x_continuous(breaks = unique(gapminder$year)) +
theme(legend.position = "top") +
labs(x = "Year", y = "Estimate", color = "Continent")
library(margins)
gss_sm$polviews_m <- relevel(gss_sm$polviews, ref = "Moderate")
out_bo <- glm(obama ~ polviews_m + sex*race, family = "binomial", data = gss_sm)
summary(out_bo)
##
## Call:
## glm(formula = obama ~ polviews_m + sex * race, family = "binomial",
## data = gss_sm)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.9045 -0.5541 0.1772 0.5418 2.2437
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.296493 0.134091 2.211 0.02703 *
## polviews_mExtremely Liberal 2.372950 0.525045 4.520 6.20e-06 ***
## polviews_mLiberal 2.600031 0.356666 7.290 3.10e-13 ***
## polviews_mSlightly Liberal 1.293172 0.248435 5.205 1.94e-07 ***
## polviews_mSlightly Conservative -1.355277 0.181291 -7.476 7.68e-14 ***
## polviews_mConservative -2.347463 0.200384 -11.715 < 2e-16 ***
## polviews_mExtremely Conservative -2.727384 0.387210 -7.044 1.87e-12 ***
## sexFemale 0.254866 0.145370 1.753 0.07956 .
## raceBlack 3.849526 0.501319 7.679 1.61e-14 ***
## raceOther -0.002143 0.435763 -0.005 0.99608
## sexFemale:raceBlack -0.197506 0.660066 -0.299 0.76477
## sexFemale:raceOther 1.574829 0.587657 2.680 0.00737 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2247.9 on 1697 degrees of freedom
## Residual deviance: 1345.9 on 1686 degrees of freedom
## (1169 observations deleted due to missingness)
## AIC: 1369.9
##
## Number of Fisher Scoring iterations: 6
bo_m <- margins(out_bo)
summary(bo_m)
## factor AME SE z p lower
## polviews_mConservative -0.4119 0.0283 -14.5394 0.0000 -0.4674
## polviews_mExtremely Conservative -0.4538 0.0420 -10.7971 0.0000 -0.5361
## polviews_mExtremely Liberal 0.2681 0.0295 9.0996 0.0000 0.2103
## polviews_mLiberal 0.2768 0.0229 12.0736 0.0000 0.2319
## polviews_mSlightly Conservative -0.2658 0.0330 -8.0596 0.0000 -0.3304
## polviews_mSlightly Liberal 0.1933 0.0303 6.3896 0.0000 0.1340
## raceBlack 0.4032 0.0173 23.3568 0.0000 0.3694
## raceOther 0.1247 0.0386 3.2297 0.0012 0.0490
## sexFemale 0.0443 0.0177 2.5073 0.0122 0.0097
## upper
## -0.3564
## -0.3714
## 0.3258
## 0.3218
## -0.2011
## 0.2526
## 0.4371
## 0.2005
## 0.0789
plot(bo_m)
bo_gg <- as_tibble(summary(bo_m))
prefixes <- c("polviews_m", "sex")
bo_gg$factor <- prefix_strip(bo_gg$factor, prefixes)
bo_gg$factor <- prefix_replace(bo_gg$factor, "race", "Race: ")
bo_gg %>% select(factor, AME, lower, upper)
## # A tibble: 9 x 4
## factor AME lower upper
## <chr> <dbl> <dbl> <dbl>
## 1 Conservative -0.412 -0.467 -0.356
## 2 Extremely Conservative -0.454 -0.536 -0.371
## 3 Extremely Liberal 0.268 0.210 0.326
## 4 Liberal 0.277 0.232 0.322
## 5 Slightly Conservative -0.266 -0.330 -0.201
## 6 Slightly Liberal 0.193 0.134 0.253
## 7 Race: Black 0.403 0.369 0.437
## 8 Race: Other 0.125 0.0490 0.200
## 9 Female 0.0443 0.00967 0.0789
p <- ggplot(data = bo_gg, aes(x = reorder(factor, AME),
y = AME, ymin = lower, ymax = upper))
p + geom_hline(yintercept = 0, color = "gray80") +
geom_pointrange() + coord_flip() +
labs(x = NULL, y = "Average Marginal Effect")
cplot(out_bo, "sex")
pv_cp <- cplot(out_bo, x = "sex", draw = FALSE)
p <- ggplot(data = pv_cp, aes(x = reorder(xvals, yvals),
y = yvals, ymin = lower, ymax = upper))
p + geom_hline(yintercept = 0, color = "gray80") +
geom_pointrange() + coord_flip() +
labs(x = NULL, y = "Conditional Effect")
library(survey)
## Loading required package: grid
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
##
## Attaching package: 'survey'
## The following object is masked from 'package:graphics':
##
## dotchart
library(srvyr)
##
## Attaching package: 'srvyr'
## The following object is masked from 'package:stats':
##
## filter
options(survey.lonely.psu = "adjust")
options(na.action = "na.pass")
gss_wt <- subset(gss_lon, year > 1974) %>%
mutate(stratvar = interaction(year, vstrat)) %>%
as_survey_design(ids = vpsu, strata = stratvar, weights = wtssall, nest = TRUE)
out_grp <- gss_wt %>%
filter(year %in% seq(1976, 2016, by = 4), !is.na(degree)) %>%
group_by(year, race, degree) %>%
summarize(prop = survey_mean(na.rm = TRUE))
out_grp
## # A tibble: 146 x 5
## # Groups: year, race [30]
## year race degree prop prop_se
## <dbl> <fct> <fct> <dbl> <dbl>
## 1 1976 White Lt High School 0.328 0.0160
## 2 1976 White High School 0.518 0.0162
## 3 1976 White Junior College 0.0129 0.00298
## 4 1976 White Bachelor 0.101 0.00960
## 5 1976 White Graduate 0.0393 0.00644
## 6 1976 Black Lt High School 0.562 0.0611
## 7 1976 Black High School 0.337 0.0476
## 8 1976 Black Junior College 0.0426 0.0193
## 9 1976 Black Bachelor 0.0581 0.0239
## 10 1976 Other Lt High School 0.250 0.152
## # ... with 136 more rows
out_mrg <- gss_wt %>%
filter(year %in% seq(1976, 2016, by = 4)) %>%
mutate(racedeg = interaction(race, degree)) %>%
group_by(year, racedeg) %>%
summarize(prop = survey_mean(na.rm = TRUE))
out_mrg
## # A tibble: 155 x 4
## # Groups: year [10]
## year racedeg prop prop_se
## <dbl> <fct> <dbl> <dbl>
## 1 1976 White.Lt High School 0.297 0.0146
## 2 1976 Black.Lt High School 0.0470 0.00837
## 3 1976 Other.Lt High School 0.00194 0.00138
## 4 1976 White.High School 0.469 0.0159
## 5 1976 Black.High School 0.0282 0.00593
## 6 1976 Other.High School 0.00324 0.00166
## 7 1976 White.Junior College 0.0117 0.00268
## 8 1976 Black.Junior College 0.00356 0.00162
## 9 1976 White.Bachelor 0.0916 0.00883
## 10 1976 Black.Bachelor 0.00486 0.00213
## # ... with 145 more rows
out_mrg <- gss_wt %>%
filter(year %in% seq(1976, 2016, by = 4)) %>%
mutate(racedeg = interaction(race, degree)) %>%
group_by(year, racedeg) %>%
summarize(prop = survey_mean(na.rm = TRUE)) %>%
separate(racedeg, sep = "\\.", into = c("race", "degree"))
out_mrg
## # A tibble: 155 x 5
## # Groups: year [10]
## year race degree prop prop_se
## <dbl> <chr> <chr> <dbl> <dbl>
## 1 1976 White Lt High School 0.297 0.0146
## 2 1976 Black Lt High School 0.0470 0.00837
## 3 1976 Other Lt High School 0.00194 0.00138
## 4 1976 White High School 0.469 0.0159
## 5 1976 Black High School 0.0282 0.00593
## 6 1976 Other High School 0.00324 0.00166
## 7 1976 White Junior College 0.0117 0.00268
## 8 1976 Black Junior College 0.00356 0.00162
## 9 1976 White Bachelor 0.0916 0.00883
## 10 1976 Black Bachelor 0.00486 0.00213
## # ... with 145 more rows
p <- ggplot(data = subset(out_grp, race %nin% "Other"),
mapping = aes(x = degree, y = prop, ymin = prop - 2*prop_se, ymax = prop + 2*prop_se,
fill = race, color = race))
dodge <- position_dodge(width = 0.9)
p + geom_col(position = dodge, alpha = 0.2) +
geom_errorbar(position = dodge, width = 0.2) +
scale_x_discrete(labels = scales::wrap_format(10)) +
scale_y_continuous(labels = scales::percent) +
scale_color_brewer(type = "qual", palette = "Dark2") +
scale_fill_brewer(type = "qual", palette = "Dark2") +
labs(title = "Educational Attainment by Race",
subtitle = "GSS 1976-2016",
fill = "Race", color = "Race", x = NULL, y = "Percent") +
facet_wrap(~ year, ncol = 2) +
theme(legend.position = "top")
p <- ggplot(data = subset(out_grp, race %nin% "Other"),
mapping = aes(x = year, y = prop, ymin = prop - 2*prop_se, ymax = prop + 2*prop_se,
fill = race, color = race))
p + geom_ribbon(alpha = 0.3, aes(color = NULL)) +
geom_line() +
scale_y_continuous(labels = scales::percent) +
scale_color_brewer(type = "qual", palette = "Dark2") +
scale_fill_brewer(type = "qual", palette = "Dark2") +
labs(title = "Educational Attainment by Race",
subtitle = "GSS 1976-2016",
fill = "Race", color = "Race", x = NULL, y = "Percent") +
facet_wrap(~ degree, ncol = 1) +
theme(legend.position = "top")
out <- lm(formula = lifeExp ~ log(gdpPercap) + pop + continent, data = gapminder)
plot(out, which = c(1, 2), ask = FALSE)
#install.packages("ggfortify") # for using autoplot()
library(coefplot)
out <- lm(formula = lifeExp ~ log(gdpPercap) + log(pop) + continent, data = gapminder)
coefplot(out, sort = "magnitude", intercept = FALSE)
#install.packages("infer")
library(GGally)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
organdata_sm <- organdata %>% select(donors, pop_dens, pubhealth, roads, consent_law)
ggpairs(data = organdata_sm, mapping = aes(color = consent_law),
upper = list(continuous = wrap("density"), combo = wrap("box_no_facet")),
lower = list(continuous = wrap("points"), combo = wrap("dot_no_facet")))
## Warning: Removed 34 rows containing non-finite values (stat_density).
## Warning: Removed 34 rows containing non-finite values (stat_density2d).
## Warning: Removed 37 rows containing non-finite values (stat_density2d).
## Warning: Removed 34 rows containing non-finite values (stat_density2d).
## Warning: Removed 34 rows containing non-finite values (stat_boxplot).
## Warning: Removed 34 rows containing missing values (geom_point).
## Warning: Removed 17 rows containing non-finite values (stat_density).
## Warning: Removed 21 rows containing non-finite values (stat_density2d).
## Warning: Removed 17 rows containing non-finite values (stat_density2d).
## Warning: Removed 17 rows containing non-finite values (stat_boxplot).
## Warning: Removed 37 rows containing missing values (geom_point).
## Warning: Removed 21 rows containing missing values (geom_point).
## Warning: Removed 21 rows containing non-finite values (stat_density).
## Warning: Removed 21 rows containing non-finite values (stat_density2d).
## Warning: Removed 21 rows containing non-finite values (stat_boxplot).
## Warning: Removed 34 rows containing missing values (geom_point).
## Warning: Removed 17 rows containing missing values (geom_point).
## Warning: Removed 21 rows containing missing values (geom_point).
## Warning: Removed 17 rows containing non-finite values (stat_density).
## Warning: Removed 17 rows containing non-finite values (stat_boxplot).
## Warning: Removed 34 rows containing missing values (geom_point).
## Warning: Removed 17 rows containing missing values (geom_point).
## Warning: Removed 21 rows containing missing values (geom_point).
## Warning: Removed 17 rows containing missing values (geom_point).
election %>% select(state, total_vote, r_points, pct_trump, party, census) %>%
sample_n(5)
## # A tibble: 5 x 6
## state total_vote r_points pct_trump party census
## <chr> <dbl> <dbl> <dbl> <chr> <chr>
## 1 South Dakota 370093 29.8 61.5 Republican Midwest
## 2 Maine 747927 -2.96 44.9 Democratic Northeast
## 3 Colorado 2780247 -4.91 43.2 Democratic West
## 4 Indiana 2757965 19.0 56.5 Republican Midwest
## 5 Nevada 1125385 -2.42 45.5 Democratic West
# Hex color codes for Dem Blue and Rep Red
party_colors <- c("#2E74C0", "#CB454A")
p0 <- ggplot(data = subset(election, st %nin% "DC"),
mapping = aes(x = r_points, y = reorder(state, r_points), color = party))
p1 <- p0 + geom_vline(xintercept = 0, color = "gray30") +
geom_point(size = 2)
p2 <- p1 + scale_color_manual(values = party_colors)
p3 <- p2 + scale_x_continuous(breaks = c(-30, -20, -10, 0, 10, 20, 30, 40),
labels = c("30\n(Clinton)", "20", "10", "0", "10", "20", "30", "40\n(Trump)"))
p3 + facet_wrap(~ census, ncol = 1, scales = "free_y") +
guides(color = FALSE) + labs(x = "Point Margin", y = NULL) +
theme(axis.text = element_text(size = 8))
library(maps)
##
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
##
## map
us_states <- map_data("state")
head(us_states)
## long lat group order region subregion
## 1 -87.46201 30.38968 1 1 alabama <NA>
## 2 -87.48493 30.37249 1 2 alabama <NA>
## 3 -87.52503 30.37249 1 3 alabama <NA>
## 4 -87.53076 30.33239 1 4 alabama <NA>
## 5 -87.57087 30.32665 1 5 alabama <NA>
## 6 -87.58806 30.32665 1 6 alabama <NA>
dim(us_states)
## [1] 15537 6
p <- ggplot(data = us_states, mapping = aes(x = long, y = lat, group = group))
p + geom_polygon(fill = "white", color = "black")
p <- ggplot(data = us_states, mapping = aes(x = long, y = lat, group = group, fill = region))
p + geom_polygon(color = "gray90", size = 0.1) + guides(fill = FALSE)
p <- ggplot(data = us_states, mapping = aes(x = long, y = lat, group = group, fill = region))
p + geom_polygon(color = "gray90", size = 0.1) +
coord_map(projection = "albers", lat0 = 39, lat1 = 45) + guides(fill = FALSE)
theme_map <- function(base_size = 9, base_family = NULL) {
require(grid)
theme_bw(base_size = base_size, base_family = base_family) %+replace%
theme(axis.line = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank(),
panel.background = element_blank(),
panel.border = element_blank(),
panel.grid = element_blank(),
panel.spacing = unit(0, "lines"),
plot.background = element_blank(),
legend.justification = c(0,0),
legend.position = c(0,0))
}
election$region <- tolower(election$state)
us_states_elec <- left_join(us_states, election, by = "region") # instead of: merge(us_states, election, sort = FALSE)
head(us_states_elec)
## long lat group order region subregion state st fips total_vote
## 1 -87.46201 30.38968 1 1 alabama <NA> Alabama AL 1 2123372
## 2 -87.48493 30.37249 1 2 alabama <NA> Alabama AL 1 2123372
## 3 -87.52503 30.37249 1 3 alabama <NA> Alabama AL 1 2123372
## 4 -87.53076 30.33239 1 4 alabama <NA> Alabama AL 1 2123372
## 5 -87.57087 30.32665 1 5 alabama <NA> Alabama AL 1 2123372
## 6 -87.58806 30.32665 1 6 alabama <NA> Alabama AL 1 2123372
## vote_margin winner party pct_margin r_points d_points pct_clinton
## 1 588708 Trump Republican 0.2773 27.72 -27.72 34.36
## 2 588708 Trump Republican 0.2773 27.72 -27.72 34.36
## 3 588708 Trump Republican 0.2773 27.72 -27.72 34.36
## 4 588708 Trump Republican 0.2773 27.72 -27.72 34.36
## 5 588708 Trump Republican 0.2773 27.72 -27.72 34.36
## 6 588708 Trump Republican 0.2773 27.72 -27.72 34.36
## pct_trump pct_johnson pct_other clinton_vote trump_vote johnson_vote
## 1 62.08 2.09 1.46 729547 1318255 44467
## 2 62.08 2.09 1.46 729547 1318255 44467
## 3 62.08 2.09 1.46 729547 1318255 44467
## 4 62.08 2.09 1.46 729547 1318255 44467
## 5 62.08 2.09 1.46 729547 1318255 44467
## 6 62.08 2.09 1.46 729547 1318255 44467
## other_vote ev_dem ev_rep ev_oth census
## 1 31103 0 9 0 South
## 2 31103 0 9 0 South
## 3 31103 0 9 0 South
## 4 31103 0 9 0 South
## 5 31103 0 9 0 South
## 6 31103 0 9 0 South
p0 <- ggplot(data = us_states_elec, mapping = aes(x = long, y = lat, group = group, fill = party))
p1 <- p0 + geom_polygon(color = "gray90", size = 0.1) +
coord_map(projection = "albers", lat0 = 39, lat1 = 45)
p2 <- p1 + scale_fill_manual(values = party_colors) +
labs(title = "Election Results 2016", fill = NULL)
p2 + theme_map()
p0 <- ggplot(data = us_states_elec, mapping = aes(x = long, y = lat, group = group, fill = pct_trump))
p1 <- p0 + geom_polygon(color = "gray90", size = 0.1) +
coord_map(projection = "albers", lat0 = 39, lat1 = 45)
p1 + labs(title = "Trump vote") + theme_map() + labs(fill = "Percent")
p2 <- p1 + scale_fill_gradient(low = "white", high = "#CB454A") +
labs(title = "Election Results 2016", fill = NULL)
p2 + labs(title = "Trump vote") + theme_map() + labs(fill = "Percent")
p0 <- ggplot(data = us_states_elec, mapping = aes(x = long, y = lat, group = group, fill = d_points))
p1 <- p0 + geom_polygon(color = "gray90", size = 0.1) +
coord_map(projection = "albers", lat0 = 39, lat1 = 45)
p2 <- p1 + scale_fill_gradient2() + labs(title = "Winning margins")
p2 + theme_map() + labs(fill = "Percent")
p3 <- p1 + scale_fill_gradient2(low = "red", mid = scales::muted("purple"), high = "blue",
breaks = c(-25, 0, 25, 50, 75)) +
labs(title = "Winning margins")
p3 + theme_map() + labs(fill = "Percent")
p0 <- ggplot(data = subset(us_states_elec, region %nin% "district of columbia"),
mapping = aes(x = long, y = lat, group = group, fill = d_points))
p1 <- p0 + geom_polygon(color = "gray90", size = 0.1) +
coord_map(projection = "albers", lat0 = 39, lat1 = 45)
p2 <- p1 + scale_fill_gradient2(low = "red", mid = scales::muted("purple"), high = "blue") +
labs(title = "Winning margins")
p2 + theme_map() + labs(fill = "Percent")
county_map %>% sample_n(5)
## long lat order hole piece group id
## 1 -1107851.6 285230.8 107574 FALSE 1 0500000US30063.1 30063
## 2 -871084.6 -680143.1 167197 FALSE 1 0500000US49037.1 49037
## 3 1725102.5 -1157391.9 45145 FALSE 1 0500000US13251.1 13251
## 4 459010.5 408867.1 92669 FALSE 1 0500000US27071.1 27071
## 5 893034.9 -1640496.7 72274 FALSE 1 0500000US22057.1 22057
county_data %>% select(id, name, state, pop_dens, pct_black) %>% sample_n(5)
## id name state pop_dens pct_black
## 1 31057 Dundy County NE [ 0, 10) [ 0.0, 2.0)
## 2 48339 Montgomery County TX [ 100, 500) [ 2.0, 5.0)
## 3 23009 Hancock County ME [ 10, 50) [ 0.0, 2.0)
## 4 22091 St. Helena Parish LA [ 10, 50) [50.0,85.3]
## 5 05049 Fulton County AR [ 10, 50) [ 0.0, 2.0)
head(county_data)
## id name state census_region pop_dens pop_dens4
## 1 0 <NA> <NA> <NA> [ 50, 100) [ 45, 118)
## 2 01000 1 AL South [ 50, 100) [ 45, 118)
## 3 01001 Autauga County AL South [ 50, 100) [ 45, 118)
## 4 01003 Baldwin County AL South [ 100, 500) [118,71672]
## 5 01005 Barbour County AL South [ 10, 50) [ 17, 45)
## 6 01007 Bibb County AL South [ 10, 50) [ 17, 45)
## pop_dens6 pct_black pop female white black travel_time land_area
## 1 [ 82, 215) [10.0,15.0) 318857056 50.8 77.7 13.2 25.5 3531905.43
## 2 [ 82, 215) [25.0,50.0) 4849377 51.5 69.8 26.6 24.2 50645.33
## 3 [ 82, 215) [15.0,25.0) 55395 51.5 78.1 18.4 26.2 594.44
## 4 [ 82, 215) [ 5.0,10.0) 200111 51.2 87.3 9.5 25.9 1589.78
## 5 [ 25, 45) [25.0,50.0) 26887 46.5 50.2 47.6 24.6 884.88
## 6 [ 25, 45) [15.0,25.0) 22506 46.0 76.3 22.1 27.6 622.58
## hh_income su_gun4 su_gun6 fips votes_dem_2016 votes_gop_2016 total_votes_2016
## 1 53046 <NA> <NA> 0 NA NA NA
## 2 43253 <NA> <NA> 1000 NA NA NA
## 3 53682 [11,54] [10,12) 1001 5908 18110 24661
## 4 50221 [11,54] [10,12) 1003 18409 72780 94090
## 5 32911 [ 5, 8) [ 7, 8) 1005 4848 5431 10390
## 6 36447 [11,54] [10,12) 1007 1874 6733 8748
## per_dem_2016 per_gop_2016 diff_2016 per_dem_2012 per_gop_2012 diff_2012
## 1 NA NA NA NA NA NA
## 2 NA NA NA NA NA NA
## 3 0.2395685 0.7343579 12202 0.2657577 0.7263374 11012
## 4 0.1956531 0.7735147 54371 0.2156657 0.7738975 47443
## 5 0.4666025 0.5227141 583 0.5125229 0.4833755 334
## 6 0.2142204 0.7696616 4859 0.2621857 0.7306638 3931
## winner partywinner16 winner12 partywinner12 flipped
## 1 <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA>
## 3 Trump Republican Romney Republican No
## 4 Trump Republican Romney Republican No
## 5 Trump Republican Obama Democrat Yes
## 6 Trump Republican Romney Republican No
county_full <- left_join(county_map, county_data, by = "id")
p <- ggplot(data = county_full, mapping = aes(x = long, y = lat, fill = pop_dens, group = group))
p1 <- p + geom_polygon(color = "gray90", size = 0.05) + coord_equal()
p2 <- p1 + scale_fill_brewer(palette = "Blues",
labels = c("0-10", "10-50", "50-100", "100-500", "500-1,000", "1,000-5,000", ">5,000"))
p2 + labs(fill = "Population per\nsquare mile") + theme_map() +
guides(fill = guide_legend(nrow = 1)) + theme(legend.position = "bottom")
p <- ggplot(data = county_full, mapping = aes(x = long, y = lat, fill = pct_black, group = group))
p1 <- p + geom_polygon(color = "gray90", size = 0.05) + coord_equal()
p2 <- p1 + scale_fill_brewer(palette = "Greens")
p2 + labs(fill = "US Population, Percent Black") + theme_map() +
guides(fill = guide_legend(nrow = 1)) + theme(legend.position = "bottom")
orange_pal <- RColorBrewer::brewer.pal(n = 6, name = "Oranges")
orange_pal
## [1] "#FEEDDE" "#FDD0A2" "#FDAE6B" "#FD8D3C" "#E6550D" "#A63603"
orange_rev <- rev(orange_pal)
orange_rev
## [1] "#A63603" "#E6550D" "#FD8D3C" "#FDAE6B" "#FDD0A2" "#FEEDDE"
gun_p <- ggplot(data = county_full, mapping = aes(x = long, y = lat, fill = su_gun6, group = group))
gun_p1 <- gun_p + geom_polygon(color = "gray90", size = 0.05) + coord_equal()
gun_p2 <- gun_p1 + scale_fill_manual(values = orange_pal)
gun_p2 + labs(title = "Gun-Related Suicides, 1999-2015", fill = "Rate per 100,000 pop") +
theme_map() + theme(legend.position = "bottom")
pop_p <- ggplot(data = county_full, mapping = aes(x = long, y = lat, fill = pop_dens6, group = group))
pop_p1 <- pop_p + geom_polygon(color = "gray90", size = 0.05) + coord_equal()
pop_p2 <- pop_p1 + scale_fill_manual(values = orange_rev)
pop_p2 + labs(title = "Reverse-coded Population Density", fill = "People per square mile") +
theme_map() + theme(legend.position = "bottom")
library(statebins)
p0 <- ggplot(data = election, mapping = aes(state = state, fill = pct_trump))
p0 + geom_statebins() + coord_equal() +
scale_fill_distiller(palette = "Reds", direction = 1) +
labs(fill = "Percent Trump") +
theme_statebins(legend_position = "top")
p0 <- ggplot(data = subset(election, st %nin% "DC"), mapping = aes(state = state, fill = pct_clinton))
p0 + geom_statebins(lbl_size = 4, light_lbl = "black") + coord_equal() +
scale_fill_distiller(palette = "Blues", direction = 1) +
labs(fill = "Percent Clinton") +
theme_statebins(legend_position = "top")
# Hex color codes for Dem Blue and Rep Red
party_colors <- c("#CB454A", "#2E74C0")
election <- election %>% mutate(color = recode(party, Republican = party_colors[2], Democratic = party_colors[1]))
p0 <- ggplot(data = election, mapping = aes(state = state, fill = color))
p0 + geom_statebins(lbl_size = 4) + coord_equal() +
scale_fill_manual(values = party_colors, labels = c("Trump", "Clinton")) +
labs(fill = "Winner") +
theme_statebins(legend_position = "right")
election$category <- cut(election$pct_trump, breaks=c(4, 21, 37, 53, 70),
labels = c("4-21", "21-37", "37-53", "53-70"))
model_colors <- RColorBrewer::brewer.pal(n = 4, name = "Reds")
p0 <- ggplot(data = election, mapping = aes(state = state, fill = category))
p0 + geom_statebins(lbl_size = 4) + coord_equal() +
scale_fill_manual(values = model_colors, labels = c("4-21", "21-37", "37-53", "53-70")) +
labs(fill = "Percent Trump") +
theme_statebins(legend_position = "top")
opiates
## # A tibble: 800 x 11
## year state fips deaths population crude adjusted adjusted_se region abbr
## <int> <chr> <int> <int> <int> <dbl> <dbl> <dbl> <ord> <chr>
## 1 1999 Alab~ 1 37 4430141 0.8 0.8 0.1 South AL
## 2 1999 Alas~ 2 27 624779 4.3 4 0.8 West AK
## 3 1999 Ariz~ 4 229 5023823 4.6 4.7 0.3 West AZ
## 4 1999 Arka~ 5 28 2651860 1.1 1.1 0.2 South AR
## 5 1999 Cali~ 6 1474 33499204 4.4 4.5 0.1 West CA
## 6 1999 Colo~ 8 164 4226018 3.9 3.7 0.3 West CO
## 7 1999 Conn~ 9 151 3386401 4.5 4.4 0.4 North~ CT
## 8 1999 Dela~ 10 32 774990 4.1 4.1 0.7 South DE
## 9 1999 Dist~ 11 28 570213 4.9 4.9 0.9 South DC
## 10 1999 Flor~ 12 402 15759421 2.6 2.6 0.1 South FL
## # ... with 790 more rows, and 1 more variable: division_name <chr>
opiates$region <- tolower(opiates$state)
opiates_map <- left_join(us_states, opiates, by = "region")
library(viridis)
## Loading required package: viridisLite
p0 <- ggplot(data = subset(opiates_map, year > 1999),
mapping = aes(x = long, y = lat, group = group, fill = adjusted))
p1 <- p0 + geom_polygon(color = "gray90", size = 0.05) +
coord_map(projection = "albers", lat0 = 39, lat1 = 45)
p2 <- p1 + scale_fill_viridis_c(option = "plasma")
p2 + theme_map() + facet_wrap(~ year, ncol = 3) +
theme(legend.position = "bottom", strip.background = element_blank()) +
labs(fill = "Death rate per 100,000 population",
title = "Opiate Related Deaths by State, 2000-2014")
opiates_map$category <- cut_interval(opiates_map$adjusted, n = 5)
p0 <- ggplot(data = subset(opiates_map, year > 1999),
mapping = aes(x = long, y = lat, group = group, fill = category))
p1 <- p0 + geom_polygon(color = "gray90", size = 0.05) +
coord_map(projection = "albers", lat0 = 39, lat1 = 45)
p2 <- p1 + scale_fill_viridis_d(option = "plasma")
p2 + theme_map() + facet_wrap(~ year, ncol = 3) +
theme(legend.position = "bottom", strip.background = element_blank()) +
labs(fill = "Death rate per 100,000 population",
title = "Opiate Related Deaths by State, 2000-2014")
library(ggrepel)
p <- ggplot(data = opiates, mapping = aes(x = year, y = adjusted, group = state))
p + geom_line(color = "gray70")
## Warning: Removed 17 row(s) containing missing values (geom_path).
p0 <- ggplot(data = drop_na(opiates, division_name), mapping = aes(x = year, y = adjusted))
p1 <- p0 + geom_line(color = "gray70", mapping = aes(group = state))
p2 <- p1 + geom_smooth(mapping = aes(group = division_name), se = FALSE)
p3 <- p2 + geom_text_repel(data = subset(opiates, year == max(year) & abbr != "DC"),
mapping = aes(x = year, y = adjusted, label = abbr),
size = 1.8, segment.color = NA, nudge_x = 30) +
coord_cartesian(c(min(opiates$year), max(opiates$year)))
p3 + labs(x = NULL, y = "Rate per 100,000 population",
title = "State-Level Opiate Death Rates by Census Division, 1999-2014") +
facet_wrap(~ reorder(division_name, - adjusted, na.rm = TRUE), nrow = 3)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 27 rows containing non-finite values (stat_smooth).
## Warning: Removed 17 row(s) containing missing values (geom_path).
p <- ggplot(data = county_full, mapping = aes(x = long, y = lat, fill = black, group = group))
p1 <- p + geom_polygon(color = "gray90", size = 0.05) + coord_equal()
p2 <- p1 + scale_fill_viridis_c(option = "plasma")
p2 + labs(fill = "US Population, Percent Black") + theme_map() +
guides(fill = guide_legend(nrow = 1)) + theme(legend.position = "bottom")
p <- ggplot(data = county_full, mapping = aes(x = long, y = lat, fill = pop/land_area, group = group))
p1 <- p + geom_polygon(color = "gray90", size = 0.05) + coord_equal()
p2 <- p1 + scale_fill_viridis_c(option = "inferno")
p2 + labs(fill = "Population per square mile") + theme_map() +
guides(fill = guide_legend(nrow = 1)) + theme(legend.position = "bottom")
p <- ggplot(data = county_full, mapping = aes(x = long, y = lat, fill = pop/land_area, group = group))
p1 <- p + geom_polygon(color = "gray90", size = 0.05) + coord_equal()
p2 <- p1 + scale_fill_viridis_c(option = "inferno", breaks = c(1, 10, 100, 1000, 1000, 10000, 70000),
trans = scales::pseudo_log_trans(sigma = 0.001))
p2 + labs(fill = "Population per square mile") + theme_map() +
guides(fill = guide_legend(nrow = 1)) + theme(legend.position = "bottom")
library(ggrepel)
head(asasec)
## Section Sname Beginning Revenues
## 1 Aging and the Life Course (018) Aging 12752 12104
## 2 Alcohol, Drugs and Tobacco (030) Alcohol/Drugs 11933 1144
## 3 Altruism and Social Solidarity (047) Altruism 1139 1862
## 4 Animals and Society (042) Animals 473 820
## 5 Asia/Asian America (024) Asia 9056 2116
## 6 Body and Embodiment (048) Body 3408 1618
## Expenses Ending Journal Year Members
## 1 12007 12849 No 2005 598
## 2 400 12677 No 2005 301
## 3 1875 1126 No 2005 NA
## 4 1116 177 No 2005 209
## 5 1710 9462 No 2005 365
## 6 1920 3106 No 2005 NA
p <- ggplot(data = subset(asasec, Year == 2014), mapping = aes(x = Members, y = Revenues, label = Sname))
p + geom_point() + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
p <- ggplot(data = subset(asasec, Year == 2014), mapping = aes(x = Members, y = Revenues, label = Sname))
p + geom_point(mapping = aes(color = Journal)) + geom_smooth(method = "lm")
## `geom_smooth()` using formula 'y ~ x'
p0 <- ggplot(data = subset(asasec, Year == 2014), mapping = aes(x = Members, y = Revenues, label = Sname))
p1 <- p0 + geom_point(mapping = aes(color = Journal)) + geom_smooth(method = "lm", se = FALSE, color = "gray80")
p2 <- p1 + geom_text_repel(data = subset(asasec, Year == 2014 & Revenues > 7000), size = 3)
p3 <- p2 + labs(x = "Membership", y = "Revenues", coloe = "Section has own Journal",
title = "ASA Sections", subtitle = "2014 Calendar year.", caption = "Source: ASA annual report.")
p4 <- p3 + scale_y_continuous(labels = scales::dollar) +
theme(legend.position = "bottom")
p4
## `geom_smooth()` using formula 'y ~ x'
p <-ggplot(data = organdata, mapping = aes(x = roads, y = donors, color = world))
p + geom_point(size = 2) + scale_color_brewer(palette = "Set2") +
theme(legend.position = "top")
## Warning: Removed 46 rows containing missing values (geom_point).
p + geom_point(size = 2) + scale_color_brewer(palette = "Pastel2") +
theme(legend.position = "top")
## Warning: Removed 46 rows containing missing values (geom_point).
p + geom_point(size = 2) + scale_color_brewer(palette = "Dark2") +
theme(legend.position = "top")
## Warning: Removed 46 rows containing missing values (geom_point).
#demo('colors')
cb_palette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
p4 + scale_color_manual(values = cb_palette) # also possible: scale_color_hue()
## `geom_smooth()` using formula 'y ~ x'
library(dichromat) # instead: library(colorblindr)
Default <- RColorBrewer::brewer.pal(5, "Set2")
types <- c("deutan", "protan", "tritan")
names(types) <- c("Deuteronopia", "Protanopia", "Tritanopia")
color_table <- types %>% purrr::map(~ dichromat(Default, .x)) %>%
as_tibble() %>% add_column(Default, .before = TRUE)
color_table
## # A tibble: 5 x 4
## Default Deuteronopia Protanopia Tritanopia
## <chr> <chr> <chr> <chr>
## 1 #66C2A5 #AEAEA7 #BABAA5 #82BDBD
## 2 #FC8D62 #B6B661 #9E9E63 #F29494
## 3 #8DA0CB #9C9CCB #9E9ECB #92ABAB
## 4 #E78AC3 #ACACC1 #9898C3 #DA9C9C
## 5 #A6D854 #CACA5E #D3D355 #B6C8C8
color_comp(color_table)
# Democrat Blue and Republican Red
party_colors <- c("#2E74C0", "#CB454A")
p0 <- ggplot(data = subset(county_data, flipped == "No"),
mapping = aes(x = pop, y = black/100))
p1 <- p0 + geom_point(alpha = 0.15, color = "gray50") +
scale_x_log10(labels = scales::comma)
p1
p2 <- p1 + geom_point(data = subset(county_data, flipped == "Yes"),
mapping = aes(x = pop, y = black/100, color = partywinner16)) +
scale_color_manual(values = party_colors)
p2
p3 <- p2 + scale_y_continuous(labels = scales::percent) +
labs(color = "County flipped to ...", x = "County Population(log scale)", y = "Percent Black Population",
title = "Flipped counties, 2016", caption = "Counties in gray did not flip.")
p3
p4 <- p3 + geom_text_repel(data = subset(county_data, flipped == "Yes" & black > 25),
mapping = aes(x = pop, y = black/100, label = state), size = 2)
p4 + theme_minimal() +
theme(legend.position = "top")
theme_set(theme_bw()) # also: theme_classic(), theme_gray() and default theme_grey()
p4 + theme(legend.position = "top")
theme_set(theme_dark())
p4 + theme(legend.position = "top")
library(ggthemes) # also: library(cowplot) and library(hrbrthemes)
##
## Attaching package: 'ggthemes'
## The following object is masked _by_ '.GlobalEnv':
##
## theme_map
theme_set(theme_economist())
p4 + theme(legend.position = "top")
theme_set(theme_wsj())
p4 + theme(legend.position = "top")
p4 + theme(plot.title = element_text(size = rel(0.6)),
legend.title = element_text(size = rel(0.35)),
plot.caption = element_text(size = rel(0.35)),
legend.position = "top")
theme_set(theme_grey())
p4 + theme(legend.position = "top")
p4 + theme(legend.position = "top",
plot.title = element_text(size = rel(2),
lineheight = .5,
family = "Times",
face = "bold.italic",
color = "orange"),
axis.text.x = element_text(size = rel(1),
family = "Courier",
face = "bold",
color = "purple"))
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): rodzina
## czcionek nie została znaleziona w bazie czcionek systemu Windows
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): rodzina
## czcionek nie została znaleziona w bazie czcionek systemu Windows
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## rodzina czcionek nie została znaleziona w bazie czcionek systemu Windows
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## rodzina czcionek nie została znaleziona w bazie czcionek systemu Windows
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## rodzina czcionek nie została znaleziona w bazie czcionek systemu Windows
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## rodzina czcionek nie została znaleziona w bazie czcionek systemu Windows
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## rodzina czcionek nie została znaleziona w bazie czcionek systemu Windows
yrs <- c(seq(1972, 1988, 4), 1993, seq(1996, 2016, 4))
mean_age <- gss_lon %>%
filter(age %nin% NA && year %in% yrs) %>%
group_by(year) %>%
summarize(xbar = round(mean(age, na.rm = TRUE), 0))
mean_age$y <- 0.3
yr_labs <- data.frame(x = 85, y = 0.8, year = yrs)
p <- ggplot(data = subset(gss_lon, year %in% yrs),
mapping = aes(x = age))
p1 <- p + geom_density(fill = "gray20", color = FALSE, alpha = 0.9, mapping = aes(y = ..scaled..)) +
geom_vline(data = subset(mean_age, year %in% yrs),
aes(xintercept = xbar), color = "white", size = 0.5) +
geom_text(data = subset(mean_age, year %in% yrs),
aes(x = xbar, y = y, label = xbar), nudge_x = 7.5, color = "white", size = 3.5, hjust = 1) +
geom_text(data = subset(yr_labs, year %in% yrs),
aes(x = x, y = y, label = year)) +
facet_grid(year ~ ., switch = "y")
p1
## Don't know how to automatically pick scale for object of type labelled. Defaulting to continuous.
## Warning: Removed 83 rows containing non-finite values (stat_density).
library(hrbrthemes)
#library(myriad)
p1 + theme_ipsum(base_size = 10, plot_title_size = 10, # with library(myriad): theme_book, panel_spacing = unit(0.1, "lines")
strip_text_size = 32) +
theme(plot.title = element_text(size = 16),
axis.text.x = element_text(size = 12),
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
strip.background = element_blank(),
strip.text.y = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank()) +
labs(x = "Age", y = NULL, title = "Age Distribution of\nGSS Respondents")
## Don't know how to automatically pick scale for object of type labelled. Defaulting to continuous.
## Warning: Removed 83 rows containing non-finite values (stat_density).
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): rodzina
## czcionek nie została znaleziona w bazie czcionek systemu Windows
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): rodzina
## czcionek nie została znaleziona w bazie czcionek systemu Windows
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): rodzina
## czcionek nie została znaleziona w bazie czcionek systemu Windows
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## rodzina czcionek nie została znaleziona w bazie czcionek systemu Windows
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## rodzina czcionek nie została znaleziona w bazie czcionek systemu Windows
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## rodzina czcionek nie została znaleziona w bazie czcionek systemu Windows
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## rodzina czcionek nie została znaleziona w bazie czcionek systemu Windows
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## rodzina czcionek nie została znaleziona w bazie czcionek systemu Windows
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## rodzina czcionek nie została znaleziona w bazie czcionek systemu Windows
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## rodzina czcionek nie została znaleziona w bazie czcionek systemu Windows
library(ggridges)
p <- ggplot(data = gss_lon,
mapping = aes(x = age, y = factor(year, levels = rev(unique(year)), ordered = TRUE)))
p + geom_density_ridges(alpha = 0.6, fill = "lightblue", scale = 1.5) +
scale_x_continuous(breaks = c(25, 50, 75)) +
scale_y_discrete(expand = c(0.01, 0)) +
labs(x = "Age", y = NULL, title = "Age Distribution of\nGSS Respondents") +
theme_ridges() +
theme(title = element_text(size = 16, face = "bold"))
## Picking joint bandwidth of 3.49
## Warning: Removed 221 rows containing non-finite values (stat_density_ridges).
head(fredts)
## date sp500 monbase sp500_i monbase_i
## 1 2009-03-11 696.68 1542228 100.0000 100.0000
## 2 2009-03-18 766.73 1693133 110.0548 109.7849
## 3 2009-03-25 799.10 1693133 114.7012 109.7849
## 4 2009-04-01 809.06 1733017 116.1308 112.3710
## 5 2009-04-08 830.61 1733017 119.2240 112.3710
## 6 2009-04-15 852.21 1789878 122.3245 116.0579
fredts_m <- fredts %>% select(date, sp500_i, monbase_i) %>% gather(key = series, value = score, sp500_i:monbase_i)
head(fredts_m)
## date series score
## 1 2009-03-11 sp500_i 100.0000
## 2 2009-03-18 sp500_i 110.0548
## 3 2009-03-25 sp500_i 114.7012
## 4 2009-04-01 sp500_i 116.1308
## 5 2009-04-08 sp500_i 119.2240
## 6 2009-04-15 sp500_i 122.3245
p <- ggplot(data = fredts_m, mapping = aes(x = date, y = score, group = series, color = series))
p1 <- p + geom_line() + theme(legend.position = "top") +
labs(x = NULL, y = "Index", color = "Series")
p <- ggplot(data = fredts, mapping = aes(x = date, y = sp500_i - monbase_i))
p2 <- p + geom_line() +
labs(x = "Date", y = "Difference")
cowplot::plot_grid(p1, p2, nrow = 2, rel_heights = c(0.75, 0.25), align = "v") # also: library(gridExtra) with grid.arrange()
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
head(yahoo)
## # A tibble: 6 x 4
## Year Revenue Employees Mayer
## <dbl> <dbl> <dbl> <chr>
## 1 2004 3574 7600 No
## 2 2005 5257 9800 No
## 3 2006 6425 11400 No
## 4 2007 6969 14300 No
## 5 2008 7208 13600 No
## 6 2009 6460 13900 No
p <- ggplot(data = yahoo, mapping = aes(x = Employees, y = Revenue))
p + geom_path(color = "gray80") +
geom_text(aes(color = Mayer, label = Year), size = 3, fontface = "bold") +
theme(legend.position = "bottom") +
labs(color = "Mayer is CEO", x = "Employees", y = "Revenue (Millions)",
title = "Yahoo Employees vs Revenues, 2004-2014") +
scale_y_continuous(labels = scales::dollar) +
scale_x_continuous(labels = scales::comma)
p <- ggplot(data = yahoo, mapping = aes(x = Year, y = Revenue/Employees))
p + geom_vline(xintercept = 2012) +
geom_line(color = "gray60", size = 2) +
annotate("text", x = 2013, y = 0.44, label = "Mayer becomes CEO", size = 3) +
labs(x = "Year", y = "Revenue/Employees", title = "Yahoo Revenue to Employee Ratio, 2004-2014")
head(studebt)
## # A tibble: 6 x 4
## Debt type pct Debtrc
## <ord> <fct> <int> <ord>
## 1 Under $5 Borrowers 20 Under $5
## 2 $5-$10 Borrowers 17 $5-$10
## 3 $10-$25 Borrowers 28 $10-$25
## 4 $25-$50 Borrowers 19 $25-$50
## 5 $50-$75 Borrowers 8 $50-$75
## 6 $75-$100 Borrowers 3 $75-$100
studebt
## # A tibble: 16 x 4
## Debt type pct Debtrc
## <ord> <fct> <int> <ord>
## 1 Under $5 Borrowers 20 Under $5
## 2 $5-$10 Borrowers 17 $5-$10
## 3 $10-$25 Borrowers 28 $10-$25
## 4 $25-$50 Borrowers 19 $25-$50
## 5 $50-$75 Borrowers 8 $50-$75
## 6 $75-$100 Borrowers 3 $75-$100
## 7 $100-$200 Borrowers 4 $100-$200
## 8 Over $200 Borrowers 1 Over $200
## 9 Under $5 Balances 2 Under $5
## 10 $5-$10 Balances 4 $5-$10
## 11 $10-$25 Balances 15 $10-$25
## 12 $25-$50 Balances 23 $25-$50
## 13 $50-$75 Balances 16 $50-$75
## 14 $75-$100 Balances 10 $75-$100
## 15 $100-$200 Balances 19 $100-$200
## 16 Over $200 Balances 11 Over $200
p_xlab <- "Amount Owed, in thousands of Dollars"
p_title <- "Outstanding Student Loans"
p_subtitle <- "44 million borrowers owe a total of $1.3 trillion"
p_caption <- "Source: FRB NY"
f_labs <- c(`Borrowers` = "Percent of\nall Borrowers",
`Balances` = "Percent of\nall Balances")
p <- ggplot(data = studebt, mapping = aes(x = Debt, y = pct/100, fill = type))
p + geom_bar(stat = "identity") +
scale_fill_brewer(type = "qual", palette = "Dark2") +
scale_y_continuous(labels = scales::percent) +
guides(fill = FALSE) +
theme(strip.text.x = element_text(face = "bold")) +
labs(y = NULL, x = p_xlab, caption = p_caption, title = p_title, subtitle = p_subtitle) +
facet_grid(~ type, labeller = as_labeller(f_labs)) +
coord_flip()
library(viridis)
p <- ggplot(studebt, aes(y = pct/100, x = type, fill = Debtrc))
p + geom_bar(stat = "identity", color = "gray80") +
scale_x_discrete(labels = as_labeller(f_labs)) +
scale_y_continuous(labels = scales::percent) +
scale_fill_viridis(discrete = TRUE) +
guides(fill = guide_legend(reverse = TRUE,
title.position = "top", label.position = "bottom", keywidth = 3, nrow = 1)) +
labs(x = NULL, y = NULL, fill = "Amount Owed, in thousands of dollars",
caption = p_caption, title = p_title, subtitle = p_subtitle) +
theme(legend.position = "top",
axis.text.y = element_text(face = "bold", hjust = 1, size = 10),
axis.ticks.length = unit(0, "cm"),
panel.grid.major.y = element_blank()) +
coord_flip()
my_numbers <- c(1, 2, 3, 1, 3, 5, 25)
your_numbers <- c(5, 31, 71, 1, 3, 21, 6)
my_numbers[4]
## [1] 1
my_numbers[7]
## [1] 25
my_numbers[2:4]
## [1] 2 3 1
my_numbers[c(2,4)]
## [1] 2 1
my_tb <- tibble(mine = c(1, 4, 5, 8:11), yours = c(3, 20, 16, 34:31))
class(my_tb)
## [1] "tbl_df" "tbl" "data.frame"
my_tb
## # A tibble: 7 x 2
## mine yours
## <dbl> <dbl>
## 1 1 3
## 2 4 20
## 3 5 16
## 4 8 34
## 5 9 33
## 6 10 32
## 7 11 31
my_tb[3, 1] # row 3 col 1
## # A tibble: 1 x 1
## mine
## <dbl>
## 1 5
my_tb[1, 2] # row 1 col 2
## # A tibble: 1 x 1
## yours
## <dbl>
## 1 3
my_tb[3, "mine"] # row 3 col 1
## # A tibble: 1 x 1
## mine
## <dbl>
## 1 5
my_tb[1, "yours"] # row 1 col 2
## # A tibble: 1 x 1
## yours
## <dbl>
## 1 3
my_tb[, "mine"] # all rows col 1
## # A tibble: 7 x 1
## mine
## <dbl>
## 1 1
## 2 4
## 3 5
## 4 8
## 5 9
## 6 10
## 7 11
my_tb[4, ] # row 4 all cols
## # A tibble: 1 x 2
## mine yours
## <dbl> <dbl>
## 1 8 34
my_tb$mine
## [1] 1 4 5 8 9 10 11
out <- lm(mine ~ yours, data = my_tb)
out$coefficients
## (Intercept) yours
## -0.08011921 0.28734222
out$call
## lm(formula = mine ~ yours, data = my_tb)
out$qr$rank # nested
## [1] 2
my_tb$ours <- my_tb$mine + my_tb$yours
my_tb
## # A tibble: 7 x 3
## mine yours ours
## <dbl> <dbl> <dbl>
## 1 1 3 4
## 2 4 20 24
## 3 5 16 21
## 4 8 34 42
## 5 9 33 42
## 6 10 32 42
## 7 11 31 42
edu
## # A tibble: 366 x 11
## age sex year total elem4 elem8 hs3 hs4 coll3 coll4 median
## <chr> <chr> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 25-34 Male 2016 21845 116 468 1427 6386 6015 7432 NA
## 2 25-34 Male 2015 21427 166 488 1584 6198 5920 7071 NA
## 3 25-34 Male 2014 21217 151 512 1611 6323 5910 6710 NA
## 4 25-34 Male 2013 20816 161 582 1747 6058 5749 6519 NA
## 5 25-34 Male 2012 20464 161 579 1707 6127 5619 6270 NA
## 6 25-34 Male 2011 20985 190 657 1791 6444 5750 6151 NA
## 7 25-34 Male 2010 20689 186 641 1866 6458 5587 5951 NA
## 8 25-34 Male 2009 20440 184 695 1806 6495 5508 5752 NA
## 9 25-34 Male 2008 20210 172 714 1874 6356 5277 5816 NA
## 10 25-34 Male 2007 20024 246 757 1930 6361 5137 5593 NA
## # ... with 356 more rows
edu_t <- gather(data = edu, key = school, value = freq, elem4:coll4)
head(edu_t)
## # A tibble: 6 x 7
## age sex year total median school freq
## <chr> <chr> <int> <int> <dbl> <chr> <dbl>
## 1 25-34 Male 2016 21845 NA elem4 116
## 2 25-34 Male 2015 21427 NA elem4 166
## 3 25-34 Male 2014 21217 NA elem4 151
## 4 25-34 Male 2013 20816 NA elem4 161
## 5 25-34 Male 2012 20464 NA elem4 161
## 6 25-34 Male 2011 20985 NA elem4 190
tail(edu_t)
## # A tibble: 6 x 7
## age sex year total median school freq
## <chr> <chr> <int> <int> <dbl> <chr> <dbl>
## 1 55> Female 1959 16263 8.3 coll4 688
## 2 55> Female 1957 15581 8.2 coll4 630
## 3 55> Female 1952 13662 7.9 coll4 628
## 4 55> Female 1950 13150 8.4 coll4 436
## 5 55> Female 1947 11810 7.6 coll4 343
## 6 55> Female 1940 9777 8.3 coll4 219
head(bad_date)
## # A tibble: 6 x 2
## date N
## <chr> <int>
## 1 9/1/11 44426
## 2 9/2/11 55112
## 3 9/3/11 19263
## 4 9/4/11 12330
## 5 9/5/11 8534
## 6 9/6/11 59490
p <- ggplot(data = bad_date, aes(x = date, y = N))
p + geom_line()
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?
bad_date2 <- rbind(bad_date, bad_date)
p <- ggplot(data = bad_date2, aes(x = date, y = N))
p + geom_line()
#install.packages('lubridate')
library('lubridate')
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
bad_date3 <- bad_date
bad_date3$date <- mdy(bad_date$date)
head(bad_date3)
## # A tibble: 6 x 2
## date N
## <date> <int>
## 1 2011-09-01 44426
## 2 2011-09-02 55112
## 3 2011-09-03 19263
## 4 2011-09-04 12330
## 5 2011-09-05 8534
## 6 2011-09-06 59490
p <- ggplot(data = bad_date3, aes(x = date, y = N))
p + geom_line()
url <- "https://cdn.rawgit.com/kjhealy/viz-organdata/master/organdonation.csv"
bad_year <- read_csv(url)
##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_double(),
## country = col_character(),
## world = col_character(),
## opt = col_character(),
## consent.law = col_character(),
## consent.practice = col_character(),
## consistent = col_character(),
## ccode = col_character()
## )
## i Use `spec()` for the full column specifications.
bad_year %>% select(1:3) %>% sample_n(10)
## # A tibble: 10 x 3
## country year donors
## <chr> <dbl> <dbl>
## 1 United States 2000 21.2
## 2 Belgium NA NA
## 3 Switzerland 1996 12.6
## 4 Ireland 1997 20.9
## 5 Norway 2001 14.4
## 6 Australia 1994 10.2
## 7 Switzerland 1997 14.3
## 8 Italy 1997 11.6
## 9 France 1995 15.1
## 10 Belgium 1994 22.8
p <- ggplot(data = bad_year, aes(x = year, y = donors))
p + geom_point()
## Warning: Removed 34 rows containing missing values (geom_point).
bad_year$year <- int_to_year(bad_year$year)
bad_year %>% select(1:3)
## # A tibble: 238 x 3
## country year donors
## <chr> <date> <dbl>
## 1 Australia NA NA
## 2 Australia 1991-06-15 12.1
## 3 Australia 1992-06-15 12.4
## 4 Australia 1993-06-15 12.5
## 5 Australia 1994-06-15 10.2
## 6 Australia 1995-06-15 10.2
## 7 Australia 1996-06-15 10.6
## 8 Australia 1997-06-15 10.3
## 9 Australia 1998-06-15 10.5
## 10 Australia 1999-06-15 8.67
## # ... with 228 more rows
p <- ggplot(data = bad_year, aes(x = year, y = donors))
p + geom_point()
## Warning: Removed 34 rows containing missing values (geom_point).
add_xy <- function(x, y) {
x + y
}
add_xy(x = 1, y = 7)
## [1] 8
add_xy(5, 2)
## [1] 7
plot_section <- function(section = "Culture", x = "Year", y = "Members", data = asasec, smooth = FALSE) {
require(ggplot2)
require(splines)
p <- ggplot(subset(data, Sname == section), mapping = aes_string(x = x, y = y)) # aes_string() instead of aes()
if(smooth == TRUE) {
p0 <- p + geom_smooth(color = "#999999", size = 1.2, method = "lm", formula = y ~ ns(x, 3)) +
scale_x_continuous(breaks = c(seq(2005, 2015, 4))) +
labs(title = section)
} else {
p0 <- p + geom_line(color = "#E69F00", size = 1.2) +
scale_x_continuous(breaks = c(seq(2005, 2015, 4))) +
labs(title = section)
}
print(p0)
}
plot_section("Rationality")
## Loading required package: splines
plot_section("Sexualities", smooth = TRUE)
plot_section <- function(section = "Culture", x = "Year", y = "Members", data = asasec, smooth = FALSE, ...) {
require(ggplot2)
require(splines)
p <- ggplot(subset(data, Sname == section), mapping = aes_string(x = x, y = y)) # aes_string() instead of aes()
if(smooth == TRUE) {
p0 <- p + geom_smooth(color = "#999999", size = 1.2, ...) +
scale_x_continuous(breaks = c(seq(2005, 2015, 4))) +
labs(title = section)
} else {
p0 <- p + geom_line(color = "#E69F00", size = 1.2) +
scale_x_continuous(breaks = c(seq(2005, 2015, 4))) +
labs(title = section)
}
print(p0)
}
plot_section("Comm/Urban", smooth = TRUE, method = "loess")
## `geom_smooth()` using formula 'y ~ x'
plot_section("Children", smooth = TRUE, method = "lm", formula = y ~ ns(x, 2))